Hacker Newsnew | past | comments | ask | show | jobs | submit | fxwin's commentslogin

Very off-putting readme/description, and even after forcing myself to read through it, I don't know why I should use this instead of Copilot/Continue/CC.

> One day I had a subtle bug: the app couldn’t load an index file living under WSL. I asked an AI for help. At first it answered with vague explanations I didn’t really understand. After pushing it to be explicit, I eventually realized what it was suggesting: "fix" the issue by deleting the index file whenever loading it fails.

> I challenged it: "So your plan is: we spend an hour indexing data, save a persistent index, and if we can’t reload it later, we just delete it and start over? That’s like Word saving a document… and then opening an empty file next time. That’s not a fix, that’s a destructive workaround." After more pressure, the bot finally admitted that yes, this was its plan, and backed off.

Even the (supposed) quote of yourself in your story is AI generated, I'm not even sure what to say at this point

> The AI can propose ideas, but it never silently edits your codebase or ships a destructive "fix" behind your back.

I don't know of any widely used tool that "silently edits your codebase"

> AICode is designed to be transparent, to acknowledge its limits, and to work with you to reach a reliable result.

Considering the general vagueness in the description, I will assume you haven't found a novel way of aligning models/enforcing guardrails, and "designed to" is just a fancy way of saying "instructed to"

> AICode does not upload your whole codebase contents to the cloud, because it runs primarily on your machine, connects directly to the OpenAI API servers, and sends only selected source code extracts

That is, in fact, "the cloud™", and every other tool already does this.


> The elephant in the room is that we’re all using AI to write but none of us wants to feel like we’re reading AI generated content.

My initial reaction to the first half of this sentence was "Uhh, no?", but then i realized it's on substack, so probably more typical for that particular type of writer (writing to post, not writing to be read). I don't even let it write documentation or other technical things anymore because it kept getting small details wrong or injecting meaning in subtle ways that isn't there.

The main problem for me aren't even the eye-roll inducing phrases from the article (though they don't help), it's that LMs tend to subtly but meaningfully alter content, causing the effect of the text to be (at best slightly) misaligned with the effect I intended. It's sort of an uncanny valley for text.

Along with the problems above, manual writing also serves as a sort of "proof-of-work" establishing credibility and meaning of an article - if you didn't bother taking the time to write it, why should i spend my time reading it?


Had the same thought reading this. I haven't found a place for LLMs in my writing and I'm sure many people have the same experience.

I'm sure it's great for pumping out SEO corporate blogposts. How many articles are out there already on the "hidden costs of micromanagement", to take an example from this post, and how many people actually read them? For original writing, if you don't have enough to say or can't [bother] putting your thoughts into coherent language, that's not something AI can truly help with in my experience. The result will be vague, wordy and inconsistent. No amount of patching-over, the kind of "deslopification" this post proposes, will help salvage something minimum work has been put into.


Indeed. I have never used an LLM to write. And coding agents are terrible at writing documentation, it's just bullet points with no context and unnecessary icons that are impossible to understand. There's no flow to the text, no actual reasoning (only confusing comments about changes made during the development that are absolutely irrelevant to the final work), and yet somehow too long.

The elephant in the room is that AI is allowing developers who previously half-assed their work to now quarter-ass it.


"Please write me some documentation for this code. Don't just give me a list of bullet points. Make sure you include some context. Don't include any icons. Make sure the text flows well and that there's actual reasoning. Don't include comments about changes made during development that are irrelevant to the final work. Try to keep it concise while respecting these rules."

I think many of the criticisms of LLMs come from shallow use of it. People just say "write some documentation" and then aren't happy with the result. But in many cases, you can fix the things you don't like with more precise prompting. You can also iterate a few rounds to improve the output instead of just accepting the first answer. I'm not saying LLMs are flawless. Just that there's a middle ground between "the documentation it produced was terrible" and "the documentation it produced was exactly how I would have written it".


Believe me, I've tried. By the time i get the documentation to be the way I want it, I am no longer faster than if i had just written it myself, with a much more annoying process along the way. Models have a place (e.g. for fixing formatting or filling out say sample json returns), but for almost anything actually core content related I still find them lacking.

I guarantee if you give me your prompt and the output you got I can fix it and get you a 10x better output in less than 5 minutes.

DM me on substack if you don't wanna post it here, I'm honestly happy to help wherever I can.


I won't share work related stuff for obvious reasons, but feel free to post an example of some LLM generated (technical) article or report of yours (I also doubt you would be able to understand the subtle differences i take issue with in LLM output in 5 minutes in the first place)

But are you gaining a meaningful amount of time, and are your coworkers that thorough.

Honestly I just don't read documentation three of my coworkers put on anymore (33% of my team). I already spend way to much time fixing the small coding issues I find in their PRs to also read their tests and doc. It's not their fault, some of them are pretty new, the other always took time to understand stuff and their children de output always was below average in quality in general (their people/soft skills are great, and they have other qualities that balance the team).


Why not write it yourself?

Sure, but that's part of my point. It gives a facade of attention to detail (on the part of the dev) where there was none.

OP here. You're absolutely right!

Most people drop a one line prompt like "write amazing article on climate change. make no mistakes" and wonder why it's unreadable.

Just like writing manually, it's an iterative approach and you're not gonna get it right the first, second or third time. But over time you'll get how the model thinks.

The irony is that people talk about being lazy for using LLMs but they're too lazy to even write a detailed prompt.


I have tried using them, both for technical documentation (Think Readme.md) and for more expository material (Think wiki articles), and bounced off of them pretty quickly. They're too verbose and focus on the wrong things for the former, where output is intended to get people up to speed quickly, and suffer from the things i mentioned above for the latter, causing me to have to rewrite a lot, causing more frustration than just writing it myself in the first place.

That's without even mentioning the personal advantages you get from distilling notes, structuring and writing things yourself, which you get even if nobody ever reads what you write.


> > The elephant in the room is that we’re all using AI to write but none of us wants to feel like we’re reading AI generated content.

Reminds me of a quote from St. Augustine's autobiography, "Confessions":

"I have known many men who wished to deceive, but none who wished to be deceived."


> it seems linking to a copy that claims the dataset is public domain, would be problematic copyright-wise.

Would it? Sounds to me like the blame lies on the person uploading the dataset under that license, unless there is some reasonable person standard applied here like 'everyone knows Harry Potter, and thus they should know it is obviously not CC0'


> unless there is some reasonable person standard applied here like 'everyone knows Harry Potter, and thus they should know it is obviously not CC0'

Yes there's an expectation that you put in some minimum amount of effort. The license issue here is not subtle, the Kaggle page says they just downloaded the eBooks and converted them to txt. The author is clearly familiar enough with HP to know that it's not old enough to be public domain, and the Kaggle page makes it pretty clear that they didn't get some kind of special permission.

If you want to get more specific on the legal side then copyright infringement does not require that you _knew_ you were infringing on the copyright, it's still infringement either way and you can be made to pay damages. It's entirely on you to verify the license.


> unless there is some reasonable person standard applied here like 'everyone knows Harry Potter, and thus they should know it is obviously not CC0'

Why wouldn't that apply?


I'm not a copyright expert and if you told me that Harry Potter was common domain then I'd probably be a bit surprised but wouldn't think it's crazy. The first book came out 30 years ago after all. On further research the copyright laws are way more aggressive than that (a bit too much if you ask me) but 30 years doesn't seem quick. Patents expire after 20 years.

It would be incredibly naive to assume that a moneymaker like that is PD.

Sherlock Holmes is public domain and there are still shows being announced

New Sherlock Holmes works are copyrighted. Not by Conan Doyle...

I find this fascinating, as I keep observing that there are pretty widespread differences between what people believe copyright does and what the law actually says.

The Berne Convention (author's life + 50 years) is the baseline for the copyright laws in most countries. Many countries have a longer copyright period than Berne.

https://en.wikipedia.org/wiki/List_of_copyright_duration_by_...


I think even people who don't care about how broken the copyright system is understand intuitively that huge commercial properties that are contemporaneous with themselves are protected. They don't need to know any details to know that these properties belong to massive companies and aren't free for the taking.

How many people think they can rip off Disney characters even if they don't know how much Disney lobbied to extend their ownership? People can observe that no one but Disney gets to use them and understand, even if not consciously, that those are Disney's to use.

^ Probably poorly written without time to proof cause time constraint.


It is a media franchise for children, and there are many elements, and trademarks in addition to copyrights. I think most fans understand the bright line that stops them copying an entire book or film work, unless their dad has a Roku at home.

But there are over 34,000 images uploaded to the Fandom.com site alone. There are character bios and generous quotes from films and books. Countless fans are using elements in memes and avatars and social media posts.

Fan-fiction abounds, where the characters and scenarios are endlessly remixed and mashed up with other fandoms.

Quidditch... simulated... is a collegiate sport, but they had to rename it.

Even on the official Wizarding World site, you can make custom downloadable stuff. Not long ago, freely download wallpapers. Get free clips and trailers on any video site.

News outlets had a difficult time explaining the "Public Domain" status of Mickey Mouse and Betty Boop with the new years. Because Mickey Mouse and Betty Boop, the characters, aren't the things which are copyrighted, and the characters' status didn't change with the new year.

I would bet that the typefaces in the official books have their own copyrights, and the book binding processes are patented.


The article author and the uploader should _BOTH_ be sentient enough to engage brain and not just ignore it because they feel "it's an abstract concept I'd not get in trouble for when not working in the US or EU".

Copyright infringement is a strict liability tort in the US. Willful infringement can result in harsher penalties, but being mistaken about the copyright status is not a valid defense.

I don't know if you're trying to say that, in the realm of tort law, it is only strict liability, or if you are saying that copyright infringement is only a tort. If it's the latter, it's completely untrue, as there are criminal copyright infringement statutes.

I feel like the title is a bit misleading, unless the person who put all HP books on Kaggle as a (supposedly) CC0-licensed data set did so as a Microsoft employee.

Nevertheless pretty egregious oversight (incompetence?) and something that shouldn't have been published.


What makes this different from linking to a random zip file somewhere?

Microsoft could have used any dataset for their blog, they could have even chosen to use actual public domain novels. Instead, they opted to use copywritten works that JK hasn't released into the public domain (unless user "Shubham Maindola" is JK's alter ego).

Rowling is known for using pseudonyms. Maybe she got tired of writing and decided to break into LLM tech.

The licensing: If I steal something and tell you its free and yours for the taking, that feels different than a Fence (knowingly) buying stolen goods. It's obviously semantics and there should have been some better judgemend from MS, but downloading a dataset (stated as public domain) from kaggle feels spiritually different from piracy (e.g.: if someone uploads a less known, copyrighted data set to kaggle/huggingface under an incorrect license, are tutorials that use this data set a 'guide to pirating' this data set? To me, that feels like a wrong use of the term)

The licence?

If it comes from a site claiming it was under a licence when it was not, the misdeed is done by the person who provided the version carrying the licence.


Just because it says "CC0" does not make it CC0. If you upload a dataset you don't have the rights to, any license declaration you make is null and void, and anyone using it as if it had that license is violating copyright

Even if MS could claim that they were acting in good faith there really isn't much legal wiggle room for that. But it doesn't even come to that because I don't think anyone would buy that they really thought that the Harry Potter books were under the CC0


If you buy a pirated book on Amazon you get to keep the book and the pirate printer is the one persecuted.

Same thing applies here.

Up to 80% off all works that are in copyright terms are accidentally in the public domain. A well known example is Night of the Living Dead. It is not your job to check that the copiright on a work you use is the correct one.


The only reason you get to keep the book is because no bothers to enforce the law, this doesn't make it legal.

And it is your job to check that you have the rights to use other people's work. Ignorance is not a defence.


>the law

Which ones? As far as I was aware, it's a crime to redistribute copyrighted works, not receive.


Copyright act 1968. Sect 116.

Section 116 (2) A plaintiff is not entitled by virtue of this section to any damages or to any other pecuniary remedy, other than costs, if it is established that, at the time of the conversion or detention:

(a) the defendant was not aware, and had no reasonable grounds for suspecting, that copyright subsisted in the work or other subject - matter to which the action relates;

(b) where the articles converted or detained were infringing copies--the defendant believed, and had reasonable grounds for believing, that they were not infringing copies; or

(c) where an article converted or detained was a device used or intended to be used for making articles--the defendant believed, and had reasonable grounds for believing, that the articles so made or intended to be made were not or would not be, as the case may be, infringing copies.

Does this not mean the opposite of your claim? It sounds to me that if you unwittingly bought a dodgy copy of something, the law thinks the copyright owner can get you to pay for a legit copy, but not punish you for your mistake.

In the specific case of the Harry Potter works, the fame might meet the threshold of reasonable grounds for believing, but noosphr's argument that "Up to 80% off all works that are in copyright terms are accidentally in the public domain" could grant a reasonable grounds for believing it is not.

This is one of those things that causes interesting court cases because a reasonable grounds for believing X is not the same thing as not reasonable grounds for believing not X. Reasonable grounds for suspicion probably carries more weight here than reasonable grounds for the absence of suspicion, but cases have hung on things like this before , like the presence or absence of an Oxford comma.


Australia doesn't have fair use either. Who cares what a country smaller than California in population and economy does?

Oh come on. The licence was obviously incorrect and you cant escape culpability because of that.

The 'artwork' they generated and the text on the blog post?

To clarify: Microsoft linked to a dataset on Kaggle, which is falsely labeled CC0 (Public Domain). It's the fault of the user who uploaded the dataset and misrepresented the licensing.

Multiple failures. One on writer of blog even for a moment considering that such data set would be legal. And next for MS for hiring such a person with that poor judgement. Namely publicly posting about it on company platform. Instead of choosing some other data set.

The original title was "LangChain Integration for Vector Support for SQL-based AI applications"

For some reason I really like this.

Despite the name, I wouldn't say it is "about" Julia sets, at least not any more than it is about any other kind of fractal


1) The first 2 links in the readme are 404s

2) I feel like the very first question this project needs to answer is: "Why should i use this over the official AWS CDK?" (assuming you want other people to use this). Besides maybe some nicer syntax I don't really see a reason to use this and lose all the (sometimes lacking) documentation, examples + community support that exists for CDK.

2.1) The listed benefits include "Use your favorite [...] type checker", but the example uses strings for specifying field types (vs. AWS CDK which would use enums, and is also "pure python") which immediately throws that out of the window


Thanks for having a look.

1. Links are fixed now.

2. We try to answer that hopefully better at our website https://stelvio.dev but you're right we should do it better in readme too! - Can you please be more specific regarding syntax? What you don't like or what you'd like to see? We try to have - Whole point of Stelvio is that you need to write much less code to define your infra, e.g. almost none IAM code due to our linking system. We also have "dev mode" which let's you to test/debug your lambdas locally without re-deployment - We try to have comprehensive documentation and guide covering each component. If you see something missing please can you be more specific? - Regarding community support we can't do much here other than grow community which we're trying to do. Having said that we're happy to support our users personally and answer any question or help you onboard. Just shoot email to team@stelvio.dev

2.1 many parameters/fields/properties in Stelvio have option to use either enum or string. If it's string with specific supported values it's defined as e.g. ` Literal["keys-only", "new-image", "old-image", "new-and-old-images"]` so you'll get help of IDE for auto complete as well as type-checker/linter if you use wrong value.

Thank you again for your valuable feedback. Happy to talk anytime. If you want to try Stelvio just shoot us an email and we'll help you along the way.


We also have "dev mode" which let's you to test/debug your lambdas locally without re-deployment

As does both SAM and the CDK.

https://docs.aws.amazon.com/cdk/v2/guide/testing-locally-get...

But this is 2026. You should be deploying Docker containers to Lambda and those are really easy to test locally


Docker has a large cold start time usually, sometimes it can take up to 10 seconds for a dockerized Python lambda to invoke. Are there solutions for that, that include docker?


Fair point, I’m working on a project now that doesn’t require responsiveness - it’s not user facing. We were burned so bad by Lambda cold starts for user facing lambdas that were backing websites we said forget it and just used ECS/fargate. This is my go to template - I didn’t write it. I found it 7 years ago.

https://github.com/1Strategy/fargate-cloudformation-example/...

You just update the parameter for your container image and redeploy.

Of course you have provisioned concurrency that can help.

But the easiest way to test Python lambdas locally is just to run it like any other Python script and call your event handler with the event you want to test


The manifesto link in the README still points to https://stelvio.dev/manifesto/, which should be about/manifesto/


thanks for taking time to let us know. not sure what's going on I checked on multiple devices and it links correctly.


Are you sure you checked both links? There's one at the top of the README that's linked correctly for me and one towards the bottom that is not


You're right, second link was broken. Fixed now. Thank you so much for taking time to check this again. If you need any help with stelvio have any more feedback to share please hit us at team@stelvio.dev Thank you again!


> Citing a paper to defend your position is just an appeal to authority (a fallacy that they teach you about in the same class).

an appeal to authority is fallacious when the authority is unqualified for the subject at hand. Citing a paper from a philosopher to support a point isn't fallacious, but "<philosophical statement> because my biology professor said so" is.


This doesn't really seem 'generic' at all to me?


fyi regional trains (which the deutschlandticket is valid for) are very punctual, it is the long distance/ICE trains that are always late/broken, and you cannot ride those with thw deutschlandticket anyways.


no they are not. source: i am german and i use regional trains occasionally


thats great, but they are on time 85% of the time vs long distance trains' 62%

https://www.deutschebahn.com/de/konzern/konzernprofil/zahlen...

see my other comment too


If you take a train to work five days a week and it's "on time" (not delayed by 6 minutes or more) 85% of the time, you'll be late on at least one day most weeks. Hardly very punctual.

Personally, I think they should just abandon timetables, run trains as fast as they can, and if you need to be somewhere by a certain time, you give the planner a target reliability and it uses a probabilistic model of the entire system to tell you when to leave so you can arrive on time (0 minutes delay, or earlier) with that given probability.


true, the actual word used is less important to me than the distinction between long distance trains and regional trains, since those get conflated quite a bit in this discussion.


Most local and S-Bahn trains in Germany are pretty decent, data is pretty clear on this. Its not Swiss level but still pretty good. Nothing compare to ICE.


not sure what you count RB/RE as, but they are absolutely broken as well in my experience.


The german trains, even at their worst, are so much better than anything in the US. Complaining can also be a sport in Germany. Take a ride on Njtransit or the NYC subway to appreciate the difference. Or try to get anywhere in New Jersey without a car. In many parts of Germany, you can get almost anywhere conveniently with only public transportation.


It's probably worse if it was once reliable and now not, compared to if it's never reliable: if it's never reliable, you've been trained to have a huge safety margin and backup plans, if it's reliable and suddenly it messes up, you're thrown in a new situation and have to think "Shit, what do I do now?". Probably very stressful, and it leads people to avoid the service altogether.

Although apparently NYC subways used to be better too.


what’s going on in New York is irrelevant. The trains in Germany are largely bad. Bad enough that I don’t use them unless I have to. Once they’re at that stage it doesn’t matter how much worse they get for me, I still won’t use them.


I can't say what your experience is and what 'absolutely broken' means. There is data on these things. I can only tell you what the data says. Could be you are in region that is worse then others. Or your definition of 'absolutely broken' is different then most peoples.


Are you crazy? I use local trains daily and they are everything, but punctual. Also, S-Bahn? Worst service ever.


idk what to tell you except that your personal experience does not generalize, see https://www.deutschebahn.com/de/konzern/konzernprofil/zahlen...

the regional trains run by regional orgs rather than db get similar results, e.g. bwegt in baden württemberg or beg in bavaria

https://beg.bahnland-bayern.de/de/aufgaben/kontrollieren/pue...

https://vm.baden-wuerttemberg.de/de/mobilitaet-verkehr/bahn-...


I've seen your list before and find it much easier to appreciate than the OP tbh. It is very concise, the descriptions actually describe what one might learn or struggle with and each project comes with resources to get started with (One day i might even get around to doing one of these ;)

The OP very much comes off to me as a "here are 100 books you need to read before you die" recommendation porn type of post where the author has done none of the things listed.


The OP link feels like a list you scroll until you see something that interests you, and you jump on that. An ideaboard.

The link in this chain feels like a mini-curriculum. AKA "you do all these 7 things and you'll probably become very good at any job". a decent university will probably have you do 4-5 out of these projects (making a spreadsheet program is truly a huge feat, though).

They both have some use, but different use cases in my eyes.


Agreed this is more appealing to read and visually look through even.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: