> What LLMs produce is often broken, hallucinated, or below codebase standards. ...

mrbungie · 2025-08-05T14:29:39 1754404179

> The code I generate is usually better than what I'd do by hand.

I'm always baffled by this. If you can't do it that well by hand, how can you discriminate its quality so confidently?

I get there is a artist/art consumer analogy to be made (i.e. you can see a piece is good without knowing how to paint), but I'm not convinced it is transferrable to code.

Also, not really my experience when dealing with IaC or (complex) data related code.

k1t · 2025-08-05T14:39:49 1754404789

You have misinterpreted GP's "better than what I would do" as "better than what I could do".

mrbungie · 2025-08-05T14:41:03 1754404863

That would be a more plaussible explanation. Not sure if that disambiguation can be inferred from the comment though.

acupofnope · 2025-08-06T16:20:42 1754497242

It’s clear from the original comment that’s what they mean. (Literally that’s what the comment says)

wahnfrieden · 2025-08-05T14:45:04 1754405104

You're forgetting that code quality also requires time. Developers make tradeoffs all the time on how much time to invest into improving the quality of what they write, for both new and existing code. When someone claims that LLMs can produce higher-quality code it can include quality levels that may be unjustifiably slow to hand-craft depending on constraints and needs.

Related - agentic LLMs may be slow to produce output but they are parallelizable by an individual unlike hand-written work.

mrbungie · 2025-08-05T15:05:53 1754406353

I get that. I'm exclusively talking about code quality verification after it being coded by a human or an LLM, in fact I don't really care by whom. Mainly because I do care about introducing tech debt and/or hidden balloning costs.

jf22 · 2025-08-05T15:09:12 1754406552

I can do it by hand but it takes time.

With AI the extra quality and polish is basically free and instantaneous.

mrbungie · 2025-08-05T15:16:05 1754406965

Ah, alright, that makes a lot more sense, like another poster said I read "'d" as "could".

Point still remains for junior and semi-senior devs though, or any dev trying to leap over a knowledge barrier with LLMs. Emphasis on good pipelines and human (eventually maybe also LLM based) peer-reviews will be very important in the years to come.

Kiro · 2025-08-05T15:20:46 1754407246

You underestimate how lazy people are. I always take shortcuts and skip taking edge cases into account. LLMs have no problem writing tedious guards and creating abstractions without hacks, which means the code becomes more robust than if I would do it by hand.

tptacek · 2025-08-05T14:31:35 1754404295

What an odd question. For the exact same reason people who write prose professionally usually have someone else edit their work: because editing your own work is harder, and everybody slips up sometimes.

halfmatthalfcat · 2025-08-05T14:40:07 1754404807

I didn't find it odd at all and it seems more odd to liken an LLM to a human editor.

mrbungie · 2025-08-05T14:37:29 1754404649

I'm not getting this analogy. Editors can't normally discriminate if the content itself is good (after all, the writer is the SME), but rather, only perfect its form (syntax, grammar, etc).

Well-written bullshit in perfect prose is still bullshit.

dingnuts · 2025-08-05T14:36:44 1754404604

ehhhhhhh yeah but this is like hiring Reddit to do your prose editing, considering generated code is slightly worse than what you'd find on r/programming

tptacek · 2025-08-05T14:39:16 1754404756

You can believe that or not believe that without changing the implication of the previous question, which was that someone who routinely slips while writing code would be incapable of determining whether the LLM got it right. Obviously not.

the__alchemist · 2025-08-05T14:26:11 1754403971

I am pattern matching your last statement with what I've seen with my teammates who are more AI-oriented: I suspect this is a matter of making the metrics the goal. I would rather maintain something that is simple, works, and have targeted comments than something messy that meets the metrics you list.

echelon · 2025-08-05T14:29:51 1754404191

I don't get all the prompt vibe coding going around. I don't use prompts to generate code.

I use "tab-tab" auto complete to speed through refactorings and adding new fields / plumbing.

It's easily a 3x productivity gain. On a good day it might be 10x.

It gets me through boring tedium. It gets strings and method names right for languages that aren't statically typed. For languages that are statically typed, it's still better than the best IDE AST understanding.

It won't replace the design and engineering work I do to scope out active-active systems of record, but it'll help me when time comes to build.

recursive · 2025-08-05T14:35:42 1754404542

I use tab auto complete, and i think it's a 5% productivity gain. On a good day, maybe 10%. I haven't put much effort into optimizing the setup or learning advanced usage patterns or anything. I'm using stock copilot, provided by my employer. If I had to pay for it, I wouldn't be using it, as it doesn't justify the cost.

Bootvis · 2025-08-05T14:46:51 1754405211

Really, what are you making that a 5% increase in productivity doesn’t justify a Copilot subscription?

recursive · 2025-08-05T16:00:40 1754409640

That's not a rigorously measured number.

The 5% is an increase in straight-ahead code speed. I spend a small fraction of my time typing code. Smaller than I'd like.

And it very well might be an economically rational subscription. For me personally, I'm subscription averse based on the overhead of remembering that I have a subscription and managing it.

neutronicus · 2025-08-05T14:35:52 1754404552

> For languages that are statically typed, it's still better than the best IDE AST understanding.

This is emphatically NOT my experience with a large C++ codebase.

echelon · 2025-08-05T14:48:46 1754405326

I can't attest to C++, but we've got a large Rust monorepo, and it's magical.

It expands match blocks against highly complex enums from different crates, then tab completes test cases after I write the first one. Sometimes even before that.

neutronicus · 2025-08-05T20:40:33 1754426433

We may be at different levels of "large" (and "gnarly") - this code-base has existed in some form since 1985, through various automated translations Pascal -> C -> C++.

Just by virtue of Rust being relatively short-lived I would guess that your code base is modular enough to live inside reasonable context limits, and written following mostly standard practice.

One of the main files I work on is ~40k lines of code, and one of the main proprietary API headers I consume is ~40k lines of code.

My attempts at getting the models available to Copilot to author functions for me have often failed spectacularly - as in I can't even get it to generate edits at prescribed places in the source code, follow examples from prescribed places. And the hallucination issue is EXTREME when trying to use the big C API I alluded to.

That said Claude Code (which I don't have access to at work) has been pretty impressive (although not what I would call "magical") on personal C++ projects. I don't have Opus, though.

AnotherGoodName · 2025-08-05T14:42:31 1754404951

Prompts are worth mastering. AI autocomplete is better than older autocomplete systems but of course it only works based on what you started to type.

Prompts are especially good for building a new template of structure for a new code module or basic boilerplate for some of the more verbose environments. eg. Android Java programming can be a mess, huge amounts of code for something simple like an efficient scrolling view. AI takes care of this - it's obvious code, no thought, but it's still over 100 lines scattered in XML (the view definitions), resources, and in multiple Java files.

Do you really want to be copying boilerplate like this across to many different files? Prompts that are well integrated to the IDE (they give a diff to add the code) are great (also old style Android before Jetpack sucked) https://stackoverflow.com/questions/40584424/simple-android-...

jf22 · 2025-08-05T15:11:13 1754406673

The code I generate with LLMs is clean and looks as good if not better than what I'd write by hand.

symfrog · 2025-08-05T14:26:09 1754403969

Do you have a link to some of the code that you have produced using this approach? I am yet to see a public or private repo with non-trivial generated code that is not fundamentally flawed.

micahscopes · 2025-08-05T14:34:25 1754404465

This one was a huge success:

https://github.com/micahscopes/radix_immutable

I took an existing MIT licensed prefix tree crate and had Claude+Gemini rewrite it to support immutable quickly comparable views. The execution took about one day's work, following two or three weeks thinking about the problem part time. I scoured the prefix tree libraries available in rust, as well as the various existing immutable collections libraries and found that nothing like this existed. I wanted O(1) comparable views into a prefix tree. This implementation has decently comprehensive tests and benchmarks.

No code for the next two but definitely results...

Tabu search guided graph layout:

https://bsky.app/profile/micahscopes.bsky.social/post/3luh4d...

https://bsky.app/profile/micahscopes.bsky.social/post/3luh4s...

Fast Gaussian blue noise with wgpu:

https://bsky.app/profile/micahscopes.bsky.social/post/3ls3bz...

In both these examples, I leaned on Claude to set up the boilerplate, the GUI, etc, which gave me more mental budget for playing with the challenging aspects of the problem. For example, the tabu graph layout is inspired by several papers, but I was able to iterate really quickly with claude on new ideas from my own creative imagination with the problem. A few of them actually turned out really well.

mavelikara · 2025-08-05T14:32:36 1754404356

Not the OP, not my code. But here is Mitchel Hashimoto showing his workflow and code in Zig, created with AI agent assistance: https://youtu.be/XyQ4ZTS5dGw

anonzzzies · 2025-08-05T14:48:30 1754405310

I think this still is some kind of 'fight' between assisted and more towards 'vibe'. Vibe for me means not reading the generated code, just trying it and the other extreme is writing all without AI. I don't think people here are talking about assisted : they are taking about vibe or almost vibe coding. And its fairly terrible if the llm does not have tons of info. It can loop, hang, remove tons of features, break random things etc all while being cheerful and saying 'this is production code now, ready to deploy'. And people believe it. When you use it to assist, it is great imho.

wglb · 2025-08-05T19:57:28 1754423848

https://github.com/wglb/gemini-chat Almost entirely generated by gemini based on my english language description. Several rounds with me adding requirements.

(edit)

I asked it to generate a changelog: https://github.com/wglb/gemini-chat/blob/main/CHANGELOG.md

rob_c · 2025-08-05T14:29:21 1754404161

That's disingenuous or naive. Almost nobody decides to expressly highlight the section of code (or whole files generated by ai) they just get on with the job when there's real deadlines and it's not about coding for the sake of the art form...

jakelazaroff · 2025-08-05T14:33:20 1754404400

If the generated implementation is not good, you're trading short-term "getting on with the job" and "real deadlines" for mid-to-long-term slowdown and missed deadlines.

In other words, it matters whether the AI is creating technical debt.

rob_c · 2025-08-05T20:26:07 1754425567

If you're creating technical debt, you're creating technical debt.

That has nothing to do with AI/LLMs.

If you can't understand what the tool spits out either; learn, throw it away, or get it to make something you can understand.

jakelazaroff · 2025-08-05T22:01:19 1754431279

Do you want to clarify your original comment, then? I just read it again, and it really sounds like you're saying that asking to review AI-generated code is "disingenuous or naive".

symfrog · 2025-08-05T14:31:00 1754404260

I am talking about correctness, not style, coding isn't just about being able to show activity (code produced), but rather producing a system that is correctly performing the intended task

rob_c · 2025-08-05T14:37:19 1754404639

Yes, and frankly you should be spending time writing large integration tests correctly not microscopic tests that forgot how tools interact.

It's not about lines of code or quality it's about solving a problem. If the problem creates another problem then it's bad code. If it solves the problem without causing that then great. Move onto the next problem.

zwnow · 2025-08-05T14:34:30 1754404470

Same as pretending that vibe coding isn't producing tons of slop. "Just improve your prompt bro" doesn't work for most real codebases. The recent TEA app leak is a good example of vibe coding gone wrong, I wish I had as much copium as vibe coders to be blind to these things, as most of them clearly are like "it happened to them but surely won't happen to ME."

NitpickLawyer · 2025-08-05T14:41:05 1754404865

> The recent TEA app leak is a good example of vibe coding gone wrong

Weren't there 2 or 3 dating apps that were launched before the "vibecoding" craze that went extremely popular and got extremely hacked weeks/months in? I also distinctly remember a social network having firebase global tokens on the clientside, also a few years ago.

zwnow · 2025-08-05T14:43:16 1754404996

So that's an excuse for AI getting it wrong? It should know better if its so much better.

rob_c · 2025-08-05T20:32:24 1754425944

LLMs are not meant to be infallible it's meant to be faster.

Repeat after me, token prediction is not intelligence.

NitpickLawyer · 2025-08-05T14:57:06 1754405826

Not an excuse, no. I agree it should be better. And it will get better. Just pointing out that some mistakes were systematically happening before vibecoding became a thing.

We went from "this thing is a stochastic parrot that gives you poems and famous people styled text, but not much else" to "here's a fullstack app, it may have some security issues but otherwise it mainly works" in 2.5 years. People expect perfection, and move the goalposts. Give it a second. Learn what it can do today, adapt, prepare for what it can do tomorrow.

jakelazaroff · 2025-08-05T17:27:59 1754414879

No one is moving the goalposts. There are a ton of people and companies trying to replace large swathes of workers with AI. So it's very reasonable to point out ways in which the AI's output does not measure up to that of those workers.

bpt3 · 2025-08-05T18:12:17 1754417537

I thought the idea was that AI would make us collectively better off, not flood the zone with technical debt as if thousands of newly minted CS/bootcamp graduates were unleashed without any supervision.

LLMs are still stochastic parrots, though highly impressive and occasionally useful ones. LLMs are not going to solve problems like "what is the correct security model for this application given this use case".

AI might get there at some point, but it won't be solely based on LLMs.

rob_c · 2025-08-05T20:34:57 1754426097

> "what is the correct security model for this application given this use case".

Frankly I've seen LLMs answer better than people trained in security theatre so be very careful where you draw the line.

If you're trying to say they struggle with what they've not seen before. Yes, provided that what is new isn't within the phase space they've been trained over. Remember there's no photographs of cats riding dinosaurs but SD models can generate them.

bpt3 · 2025-08-05T23:13:44 1754435624

Saying that they aren't worse than an incompetent human isn't a ringing endorsement.

mwigdahl · 2025-08-05T15:21:40 1754407300

I've heard this multiple times (Tea being an example of problems with vibe coding) but my understanding was that the Tea app issues well predated vibe coding.

I have experimented with vibe coding. With Claude Code I could produce a useful and usable small React/TS application, but it was hard to maintain and extend beyond a fairly low level of complexity. I totally agree that vibe coding (at the moment) is producing a lot of slop code, I just don't think Tea is an example of it from what I understand.

drgiggles · 2025-08-05T14:25:05 1754403905

Easily 99% of comments generated by LLMs are useless.

ggregoire · 2025-08-05T14:31:45 1754404305

That's how I detect who is using LLMs at work.

  # loop over the images
  for filename in images_filenames:

    # download the image
    image = download_image(filename)
  
    # resize the image
    resize_image(image)

    # upload the image
    upload_image(image)

micromacrofoot · 2025-08-05T14:26:45 1754404005

They're often repetitive if you're reading the code, but they're useful context that feeds back into the LLM. Often once the code is clear enough I'll delete them before pushing to production.

mrugge · 2025-08-05T14:30:42 1754404242

do you have proof of this being useful for llm? wouldn't you rather it re-read the actual code it generated instead of assuming that the potentially wishful thinking or stale comment is going to lead it astray?

micromacrofoot · 2025-08-05T15:21:39 1754407299

it reads both, so with the comments it more or less parrots the desired outcome I explained... and it sometimes catches the mismatch between code and comment itself before I even mention it

I read and understand 100% of the code it outputs, so I'm not so worried about falling too far astray...

being too prescriptive about it (like prompting "don't write comments") makes the output worse in my experience

the__alchemist · 2025-08-05T14:27:05 1754404025

I've noticed this too. They are often restatements of the line in verbal form, or intended for me, the LLM-reader about the prompt, vice a code maintainer.

rob_c · 2025-08-05T14:31:49 1754404309

99% of comments are not needed as they just re-express what the code below does.

I prefer to push for self documenting code anyway, never saw the need for docs other than for an API when I'm calling something like a black box.

shortrounddev2 · 2025-08-05T14:30:37 1754404237

I think its because LLMs are often trained with data from code tutorial sites and forums like stackoverflow, and not always production code

wglb · 2025-08-05T20:00:23 1754424023

Not what I have found with gemini.

What is particularly useful is the comments about reasoning about new code added at my request.

cjfd · 2025-08-05T14:32:28 1754404348

Very often comments generated by humans are also useless. The reason for this are mandated comment policies, e.g., 'every public method should have a comment'. An utterly disgusting practice. One should only have a comment if one has something interesting to say. In a not-overly-complex code base there should maybe be a comment perhaps every 100 lines or so. In many cases it makes more sense to comment the unit tests than the code.

skydhash · 2025-08-05T15:20:55 1754407255

I think the rules for comments on public method is to use something like doxygen to extract the reference. And most IDE can display them upon hovering. And comments can remind the caller of pre- and post-conditions.

drgiggles · 2025-08-05T14:48:34 1754405314

I am pretty far to one end of the spectrum on need for comments. Very rarely is a comment useful to help you/another developer decipher the intent and function of a piece of code.

jf22 · 2025-08-05T15:11:41 1754406701

Then tell it to write better comments...

timmytokyo · 2025-08-05T16:33:19 1754411599

Ah, so it's good enough to write code on its own without time-consuming, excessive hand-holding. But it's not good enough to write comments on its own.

jf22 · 2025-08-05T17:15:00 1754414100

If you put in the work to write rules and give good prompts you get good results just like every other tool created by mankind.

How often do you use coding LLMs?

ewoodrich · 2025-08-06T02:35:06 1754447706

I can't speak to comments rules specifically but I am a heavy user of "agentic" coding and use rules files and while they help they are simply not that reliable. For something like comments that's probably not that big of a deal because some extra bad comments isn't the end of the world.

But I have rules that are quite important for successfully completing a task by my standards and it's very frustrating when the LLM randomly ignores them. In a previous comment I explained my experiences in more detail but depending on the circumstances instruction compliance is 9/10 times at best, with some instructions/tasks as poor as 6/10 in the most "demanding" scenarios particularly as the context window fills up during a longer agentic run.

exe34 · 2025-08-05T14:30:06 1754404206

They comment on the how, not the why.

nurettin · 2025-08-05T14:38:23 1754404703

Here's my workflow (if I feel like using claude)

Me: Here's the relevant part of the code, add this simple feature.

Opus: here's the modified code blah blah bs bs

Me: Will this work?

Opus: There's a fundamental flaw in blah bleh bs bs here's the fix, but I only generate part of the code, go hunt for the lines to make the changes yourself.

Me: did you change anything from the original logic?

Opus: I added this part, do you want me to leave it as it was?

Me: closes chat

NitpickLawyer · 2025-08-05T14:48:11 1754405291

Sorry to be that guy, but you're using it wrong. The best flows right now are architect -> act -> test. First you have a session in "architect" / "plan" mode (depending on your ide/tool) where you discuss, ask questions, etc. Then, when everything is clear in "chat" mode, you ask the model to make a plan. You verify the plan, and then you tell it to start implementing it. You still get to approve tools, calls, tests, etc. You can also provide feedback on the way if you missed something (i.e. use uv instead of pip, etc).

Coding in a chat interface, and expecting the same results as with dedicated tools is ... 1-1.5 years old at this point. It might work, but your results will be subpar.

nurettin · 2025-08-05T17:35:18 1754415318

Nah it's good thanks for your input. I saw people use plan.md and todo.md and ide/commandline for this before. manus.ai demonstrates this via its chat interface as well.

intended · 2025-08-05T14:30:18 1754404218

Could you share an example ?

These conversations on AI code good, vs AI code bad constantly keep cropping up.

I feel we need to build a cultural norm to share examples places of succeeded, and failures, so that we can get to some sort of comparison and categorization.

The sharing also has to be made non-contentious, so that we get a multitude of examples. Otherwise we’d get nerd-sniped into arguing the specifics of a single case.

reactordev · 2025-08-05T14:25:49 1754403949

Let’s talk about rules and docs, shall we? What makes a good rule for AI to keep it on task? What are your setups for docs and attaching them to the context (do you need to? Or just the location?)

Let’s boil this down to an easy set of reproducible steps any engineer can take to wrangle some sense from their AI trip.

icey · 2025-08-05T15:37:28 1754408248

The company I work at (https://getunblocked.com) is built to give tools like Claude Code and Cursor context based on all your docs, issues, code, and chat threads from Slack and soon Teams. Happy to give you a demo sometime if you're interested!

jf22 · 2025-08-05T15:12:15 1754406735

There are tons of these guides around the internet. I'm only using what other people have already published.

rob_c · 2025-08-05T14:27:19 1754404039

Aka, let's train people how to use the tool...

reactordev · 2025-08-05T15:09:11 1754406551

You seem to be against the idea. Yet you yourself were trained. Weird.

exe34 · 2025-08-05T14:29:34 1754404174

Let's check that the claim matches the evidence first!

miggy · 2025-08-05T15:11:42 1754406702

In my experience, unit tests and logging code generated by LLMs tend to be overly verbose, miss meaningful assertions, and often produce boilerplate that looks correct but doesn’t test or log anything useful. It’s easy to get misled by the surface structure.

micromacrofoot · 2025-08-05T14:29:40 1754404180

I've been finding actual human-written bugs and correcting them with Claude, so I find the "often broken" claims a load of nonsense... I've been fixing dozens of minor bugs in our codebase that no one's been arsed to fix for years due to bigger priorities (which tbh is generating more features and tech debt).

It may change in the future, but AI is without a doubt improving our codebase right now. Maybe not 10X but it can easily 2X as long as you actually understand your codebase enough to explain it in writing.

apwell23 · 2025-08-05T14:40:28 1754404828

> With enough rules and good prompting this is not true.

There are atleast 10 posts on HN these days with the same discussion in circle.

1. AI sucks at code

2. you are not using my magic prompting technique

jf22 · 2025-08-05T15:12:59 1754406779

It's not magic. The techniques are well established and widely shared.

swader999 · 2025-08-06T04:50:45 1754455845

Yeah there's so many now it's hard to settle on one. YouTube is littered with them. Agent OS, amp.code, BMAD. I'm probably trying BMAD in earnest next ...

jf22 · 2025-08-06T14:48:50 1754491730

Each of the "tools" does things slightly differently but the techniques to use them effectively are largely the same now (rules, planning, context management, good prompting).

You know like when the loom came out there were probably quite a few models but using it was similar. Like cars are now.

r3trohack3r · 2025-08-05T14:31:15 1754404275

I do think a lot of the discourse in this space can be summed up as: people are arguing about two non-overlapping segments of a distribution having no idea the other segment even exists; instead they just assume the other side is [hype/pessimistic].

micahscopes · 2025-08-05T14:35:29 1754404529

It makes me wince a little

abtinf · 2025-08-05T14:26:39 1754403999

Can you point to an example repo with enough rules and good prompts?

neutronicus · 2025-08-05T14:34:56 1754404496

Yeah that fucker Claude is tireless when it comes to checking return types, checking for null, etc etc

zoeysmithe · 2025-08-05T14:33:04 1754404384

What a scary time it is for devs. We spent all this time learning this obscure skill and now when I play with claude or even chatgpt it makes really good code. I just asked it to write me a video game and it did it. Perfect godot code. I was stunned it didn't hallucinate and when I asked for clarification on a snippet of code, it perfectly answered.

I think its only a matter of time until our roles are commoditized and vibe-coding becomes the norm in most industries.

Vibe coding being a dismissive term on developing a new skillset. For example we'll be doing more planning and testing and such instead of writing code. The same way, say, sysadmins just spin up k8s instead of racking servers or car mechanics read diagnosis codes from readers and, often, just replace an electric part instead of hand-tuning carbs or gapping spark plugs and such. That is to say, a level of skill is being abstracted away.

I think we just have to see this, most likely, as how things will get done going forward.

tovej · 2025-08-05T14:56:08 1754405768

Could you at least mention what the video game was, or why it was such a good implementation? Also, what was "perfect" about the code? "Perfect" is not a word I would ever use to describe code.

This reads like empty hype to me, and there's more than one claim like this in these threads, where AI magically creates an app, but any description of the app itself is always conspicuously missing.

zoeysmithe · 2025-08-05T15:22:44 1754407364

Yes Im exaggerating and its not writing a AAA game from a prompt but I asked it to make a game like Zelda and it figured it out and walked me through all the aspects of it. That's a lot more than I expected. I'm not a games programmer, so I'm probably a lot more impresse than I should be, but I went from not knowing anything about godot to having a framework up to build a 2d rpg-esque game fairly quickly and me learning as it gave me the code. Note, I used the new chatgpt study mode, so that's may be different than just regular prompts. I fully expected just broken code and random AI musings, but instead I got a very solid implementation of a game, albeit a simple one. Or at least as simple as I asked for, I imagine I can keep building out more with its help.

I also have never used godot before, and I was surprised at how well it navigated and taught me the interface as well.

At least the horror stories about "all the code is broken and hallucinations" isn't really true for me and my uses so far. If LLM's will succeed anywhere it will be in the overly logical and predictable worlds of programming languages, but that's just a guess on my part, but thus far whenever I reach out for code from LLM's, its been a fairly positive experience.

tovej · 2025-08-05T16:21:20 1754410880

Thanks for elaborating, this puts things into perspective, although the complexity of the end product is still unclear to me.

I do still disagree with your assessment, I think the syntactic tokens in programming languages have a kind of impedance mismatch with the tokens that LLMs, and that the formal semantics of programming languages are a bad fit with the fuzzy statistical LLMs. I firmly believe that increased LLM usage will drive software safety and quality down, simply because a) no semblance of semantic reasoning or formal verification has been applied to the code and b) a software developer will have an incomplete understanding of code not written by themself.

But our opinions can co-exist, good luck in your game development journey!

zoeysmithe · 2025-08-05T16:43:31 1754412211

I'm playing with it still and now am adding more scenes and more logic. I think the complexity here is whatever my goals are. I'm not sure what the practical limits here are, or at least they exceed my own ability in games development right now. This is just a toy game, but as I reach into claude and gpt, I can keep going, which is nice. I already have coding experience so I'm not exactly a 'vibe coder' but I think professionally, I dont think people with zero coding experience are getting dev roles, but instead the role will change like my example of the modern mechanic or modern sysadmin above.

As far as QA goes, we then circle back to the tool itself being the cure for the problems the tool brings in, which is typical in technology. The same way agile/'break things' programming's solution to QA was to fire the 'hands on' QA department and then programmatically do QA. Mostly for cost savings, but partly because manual QA couldn't keep up.

I think like all artifacts in capitalism, this is 'good enough,' and as such the market will accept it. The same way my laggy buggy Windows computer would be laughable to some in the past. I know if you gave me this Win11 computer when I was big into low-footprint GUI linux desktop, I would have been very unimpressed, but now I'm used to it. Funny enough, I'm migrating back to kubuntu because Windows has become unfun and bloaty and every windows update feels a bit like gambling. But that's me. I'm not the typical market.

I think your concerns are real and correct factually and ideologically, but in terms of a capitalist market will not really matter in the end, and AI code is probably here to stay because it serves the capital owning class (lower labor costs/faster product = more profit for them). How the working class fares or if the consumer product isn't as good as it was will not matter either unless there's a huge pushback, which thus far hasn't happened (coders arent unionizing, consumers seem to accept bloaty buggy software as the norm). If anything the right-wing drift of STEM workers and the 'break things' ideology of development has primed the market for lower-quality AI products and AI-based workforces.

skydhash · 2025-08-05T15:23:44 1754407424

Especially in a world where creating a repo in GitHub (or other forges) is frictionless.

xienze · 2025-08-05T15:24:23 1754407463

Let me guess... Flappy Bird or Pong. Whoa, it one-shotted it, amazing!

tobyhinloopen · 2025-08-05T14:25:01 1754403901

I agree. With plenty of prompts (leave them in documents) you can get pretty good results.

mrugge · 2025-08-05T14:27:11 1754404031

First thing I do is tell llm to stop writing useless docstrings and comments and instead follow clean code principles where each variable is a noun and function call a verb.

tempodox · 2025-08-05T14:34:13 1754404453

So you're not a developer any more but a tenant who pays rent. Speaking of which, I have a bridge to sell you…

jf22 · 2025-08-05T15:13:17 1754406797

I'm not even sure what this is supposed to say.

drgiggles · 2025-08-05T14:27:35 1754404055

[flagged]

jf22 · 2025-08-05T17:17:46 1754414266

I'm a random nobody. I don't have a relationship with any LLM tool.

I was a skeptic until I started using the tool and learning how to get good results.

Now I'm a convert. I like sharing my experiences and getting the cope replies like this one.

apwell23 · 2025-08-05T14:41:01 1754404861

BigLLM at it again