> <good> code is easier to understand to debug, extend, and maintain [by people]
> But what if the next “person” isn’t a person?
There's certainly a hypothetical future where AI writes and maintains enterprise software and airline baggage control systems consisting of millions of lines of spaghetti code, codes that violates all the principles of good software design that we currently value, and everything turns out peachy.
Nobody but the AI understands the code, but it mostly works, and the AI fixes it when it doesn't. We lose the capability to understand and test the code ourselves, but the AI says "trust me bro - it's all been tested" (even though bugs keep turning up).
But, I doubt it.
First off, nevermind all the "parroting" nonsense, but LLMs are nonetheless auto-regressively trained and therefore fundamentally a copying technology, so as long at the LLM creators make an effort to train on high quality code, then what's generated should at least match those high quality patterns to some degree.
Secondly, human's hard-won best-practices for designing code are there for a reason, and it's not just because of the limits of our feeble minds to work with anything-goes spaghetti code. The reason why we prefer code that is modular, with thin/clean interfaces between modules, shortish functions, etc, are because these practices fundamentally do make for code that is easier to reason about, to test and debug, to update and extend without breakage, etc.
Per the Halting Problem, we know that ultimately the only way to know what code does is to run it, and therefore even if LLMs/AI were to exceed humans in general reasoning ability they would never gain some magical abililty to write arbitrarily complex/unstructured code and be able to successfully reason about what it is doing and it's correctness, etc. Following human best practices not only helps create code that is testable (possible to analyze and create test cases for all paths through a piece of code), but also code that can more easily be reasoned about, whether by man or machine.
In terms of where we are right now regarding AI's ability to write good quality code, it's perhaps informative to look at Claude Code, which is of late writing most of it's own code under the guidance of it's creator, and despite being something on the very simple end of the spectrum in terms of software complexity*, currently has about 5,000 issues filed against it on github.
* A minimal CLI agent is a few hundred lines of code vs for example the 15 million LOC of gcc.
Although it sounds counter-intuitive, you may be better off with Gemini 3 Fast (esp. in Thinking mode) rather than Gemini 3 Pro. Fast beats Pro in some benchmarks. This is also the summary conclusion that Gemini itself offers.
I've got to wonder what the potential market size is for AI driven software development.
I'd have to guess that competition and efficiency gains will reduce the cost of AI coding tools, but for now we've got $100 or $200/mo premium plans for things like Claude Code (although some users may exceed this and pay more), call it $1-2K/yr per developer, and in the US there are apparently about 1M developers, so even with a 100% adoption rate that's only $1-2B revenue spread across all providers for the US market.... a drop in the bucket for a company like Google, and hardly enough to create a sane Price-to-Sales ratio for companies like OpenAI or Anthropic given their sky-high valuations.
Corporate API usage seems to have potential to be higher (not capped by a fixed size user base), but hard to estimate what that might be.
ChatBots don't seem to be viable for long-term revenue, at least not from consumers, since it seems we'll always have things like Google "AI Mode" available for free.
Another data point: my generally tech savvy teenage daughter (17) says that her friends are only aware of AI having been available for last year (3 actually), and basically only use it via Snaphhat "My AI" (which is powered by OpenAI) as a homework helper.
I get the impression that most non-techies have either never tried "AI", or regard it as Google (search) on steroids for answering questions.
Maybe more related to his (sad but true) senility rather than lack of interest, but I was a bit shocked to see the physicist Roger Penrose interviewed recently by Curt Jaimungal, and when asked if he had tried LLMs/ChatGPT assumed the conversation was about the "stupid lady" (his words) ELIZA (fake chatbot from the 60's), evidentially never having even heard of LLMs!
That sounds more like the fad Atkin's weight loss diet that said you could eat unlimited meat/fat/protein, but no carbs.
This new JFK Jr diet has something in common with the Paleo "cave man" diet, which at least makes some sense in the argument ("this is what our bodies have evolved to eat") if not the specifics. I'm not sure where the emphasis on milk/cheese and eggs comes from since this all modern, not hunter-gatherer, and largely unhealthy, and putting red-meat at the top (more cholesterol, together with the eggs), and whole grain at the bottom makes zero sense - a recipe for heart attacks and colon cancer.
Eggs are very healthy. There's a lot of nutrients that are hard to get from other sources that eggs have in abundance. And it makes sense in just a common-sense sort of way -- if you're a chicken you want to surround your offspring with the best possible food you can as they grow.
With regards to dairy, it's more about a person's individual reaction to it. It's a similar argument with nutrient density (since milk is intended for growing offspring, obviously it's going to be very nutrient dense). The downside is potential inflammation or not having the enzymes to process it.
I would definitely not lump eggs and dairy as "bad" in any way though.
Also, the "cholesterol" thing is a very bad thing to focus on. Cholesterol is not bad! You need cholesterol. (What do you think cell membranes are partially composed of?
Whole grains are not as good as you think. Often, they're made from strains that are optimized for growing and robustness not nutrition. Also, unless you're exercising a lot you really don't need much in the way of carbs.
> Also, the "cholesterol" thing is a very bad thing to focus on. Cholesterol is not bad! You need cholesterol. (What do you think cell membranes are partially composed of?
There is also not a very strong connection between dietary cholesterol and serum levels, anyway.
There's certainly a difference between modern and ancient grain varieties, but OTOH whole grain bread is basically what fed at least the western world for the last 2000 years - bread was the center of the roman diet and also of the medieval diet, which seems more than long enough (~100 generations - evolution is fast) for this to be the natural "our bodies evolved for this" diet that we should be targeting!
As far as eggs and dairy go, sure they are healthy for who is meant to be consuming them - baby chickens and baby mammals, but that doesn't mean they are good for us in excess.
There have been, and continue to be, so may flip flops in dietary recommendations and what is good/bad for you, that it seems common sense is a better approach. All things in moderation, and indeed look to what our relatively recent ancestors have been eating to get an idea of what our bodies are evolved to eat - whole foods and not processed ones and chemical additives.
I don't think 2,000 years is enough, but am not an expert. The main thing that grains and bread did was make it a lot easier for more people to get through lean times without starving. It also allowed people to specialize: not everyone needed to be a hunter/gatherer.
20,000 years maybe yes. But we have not been agricultural for that long. And that's why grain-based food still is not something we're well adapted to.
Ancient grains are great! But frankly, you're probably not going to be finding Einkhorn grains in you're grocery store. It's not just the way whole grains are processed, it's also about the plants they grow from. Also, the way ancient grains are processed is not particularly profitable (they need to sit and ferment, for instance, and the grain itself is a lot lower yield).
If you want to eat ancient grains I'd say go for it, but when I talk about whole grains I talk about what you're going to find in an average grocery store, and even what you find at a place like Whole Foods is pretty bad.
I highly suspect that nobody other than body builders are eating eggs in excess (if that's even possible -- what bad nutrients are in eggs?). Eggs are kind of a pain in the ass to cook (other than hard boiling), and most processing is about convenience.. In any case, things like choline are hard to get from other sources, and I think it's not that wild to assume our ancestors loved to raid birds nests for nutrient dense eggs.
Agreed on a lot of flip flops in dietary recommendations, but that definitely doesn't mean that the classic food pyramid was anywhere close to correct.
If you're looking for an excellent supplier of einkhorn, I'd suggest Bluebird Grain Farms. (They're local to me, so I'm a bit biased of course. But they are a great group, and their flours and grains are excellent).
Common sense says that adults are not embryos and humans are not chickens, so if eggs are nutritious for adult humans, it's more of a happy coincidence.
Our hunter gather ancestors ate eggs when they could find them, probably often uncooked. What they generally didn't come across were trees full of snickers bars, coke and Wonder Bread.
Your body produces cholesterol naturally, without any meat or dairy. In my case it actually produces way more than I need, even on a vegan diet, because of genetic factors. People should test their LDL and evaluate whether eating cholesterol is healthy _for themselves_ as it’s different for everyone.
Dietary cholesterol does not affect blood serum cholesterol and recommendations to limit cholesterol intake were removed from AHA and ADA guidelines in 2011 and 2013 respectively... the fact that this "common knowledge" still persists is disappointing.
The Paleo diet is utter nonsense. Human gut biome and ability to process different foods evolves far far far faster than that. We are nothing like our paleo ancestors.
I'd have guessed that Google's rapid advance with Gemini was more due to the merger of Google Brain with DeepMind under Demis Hassabis than the return of Sergey Brin.
I remember seeing an interview (Dwarkesh?) with Sholto Douglas who had been working at Google at the time (now at Anthropic) who said he would work late there and the only other person was Sergey Brin, apparently wanting to be part of (or following) the development/training process.
I suppose he may have a list of feature requests and bug reports to work on, but it does seem a bit odd from a human perspective to want to work on 5 or more things literally in parallel, unless they are all so simple that there is no cognitive load and context switching required to mentally juggle them.
Washing dishes in parallel with laundry and cleaning is of course easily possible, but precisely because there is no cognitive load involved. When the washing machine stops you can interrupt what you are doing to load clothes into the drier, then go back to cleaning/whatever. Software development for anything non-trivial obviously has a much higher task-switching overhead. Optimal flow for a purely human developer is to "load context" at the beginning of the day, then remain in flow-state without interruptions.
The cynical part of me can't also help but wonder if Cherny/Anthopic aren't just advocating token-maxxing!
Same though here. I use Claude opus via api billing for tasks not that hard to implement but for which CC takes much less time than I would. However:
* a small PR costs 5-16 usd (I’ve been monitoring this for the past two days). Management is already pushing for us to use Cursor or a new tool called Augment Cod.
* I can submit 4 to 5 PRs in a day
* the bottleneck becomes:
- writing clear instructions and making the right choices
- running tests
- my mental capacity for context switching
- code reviewing, correcting
- Deployment
- Even further live testing
I don’t understand how I could have 10 parallel workers without the output being degraded due to my inability to manage them. But I can see myself wasting a lot of $$ trying. And something tells me the thread is just normalizing throwing money at them
I noticed yesterday that there were 5K+ issues filed against Claude Code on github (but down to 4.8K today!), so it may well be that this is what Cherny is churning through.
If you read though a few pages of these issues, it doesn't seem to reflect too well on the quality of the code (self-written by Claude Code), so it seems the furious pace of development/bug fixing maybe shouldn't necessarily be taken as being the pace of generating production quality code. Claude Code is of course very useful, so people are very forgiving about issues, but I can't imagine most corporate software being very well regarded if the quality was such that it had 5K issues reported against it!
That's remarkably similar to something I've just started on - I want to create a self-compiling C compiler targeting (and to run on) an 8-bit micro via a custom VM. This a basically a retro-computing hobby project.
I've worked with Gemini Fast on the web to help design the VM ISA, then next steps will be to have some AI (maybe Gemini CLI - currently free) write an assembler, disassembler and interpreter for the ISA, and then the recursive descent compiler (written in C) too.
I already had Gemini 3.0 Fast write me a precedence climbing expression parser as a more efficient drop-in replacement for a recursive descent one, although I had it do that in C++ as a proof-of-concept since I don't know yet what C libraries I want to build and use (arena allocator, etc). This involved a lot of copy-paste between Gemini output and an online C++ dev environment (OnlineGDB), but that was not too bad, although Gemini CLI would have avoided that. Too bad that Gemini web only has "code interpreter" support for Python, not C and/or C++.
Using Gemini to help define the ISA was an interesting process. It had useful input in a "pair-design" process, working on various parts of the ISA, but then failed to bring all the ideas together into a single ISA document, repeatedly missing parts of what had been previously discussed until I gave up and did that manually. The default persona of Gemini seems not very well suited to this type of work flow where you want to direct what to do next, since it seems they've RL'd the heck out of it to want to suggest next step and ask questions rather than do what is asked and wait for further instruction. I eventually had to keep asking it to "please answer then stop", and interestingly quality of the "conversation" seemed to fall apart after that (perhaps because Gemini was now predicting/generating a more adversarial conversation than a collaborative one?).
I'm wondering/hoping that Gemini CLI might be better at working on documentation than Gemini web, since then the doc can be an actual file it is editing, and it can use it's edit tool for that, as opposed to hoping that Gemini web can assemble chunks of context (various parts of the ISA discussion) into a single document.
Just as a self follow-up here (I hate to do it!), after chatting to Gemini some more more about document creation alternatives, it does seem that Gemini CLI is by far the best way to go, since it's working in similar fashion to Claude Code and making targeted edits (string replacements) to files, rather than regenerating from scratch (unless it has misinterpreted something you said as a request to do that, which would be obvious when it showed you the suggested diff).
Another alternative (not recommended due to potential for "drift") is to use Gemini's Canvas capability where it is working on a document rather than a specification being spread out over Chat, but this document is fully regenerated for every update (unlike Claude's artifacts), so there is potential for it to summarize or drop sections of the document ("drift") rather than just making requested changes. Canvas also doesn't have Artifact's versioning to allow you to go back to undo unwanted drifts/changes.
Yeah, the online Gemini app is not good for long lived conversations that build up a body of decisions. The context window gets too large and things drop.
What I’ve learned is that once you reach that point you’ve got to break that problem down into smaller pieces that the AI can work productively with.
If you’re about to start with Gemini-cli I recommend you look up https://github.com/github/spec-kit. It’s a project out of Microsoft/Github that encodes a rigorous spec-then-implement multi pass workflow. It gets the AI to produce specs, double check the specs for holes and ambiguity, plan out implementation, translate that into small tasks, then check them off as it goes. I don’t use spec-kit all the time, but it taught me that what explicit multi pass prompting can do when the context is held in files on disk, often markdown that I can go in and change as needed. I think it ask basically comes down to enforcing enough structure in the form of codified processes, self checks and/or tests for your code.
Pro tip, tell spec-kit to do TDD in your constitution and the tests will keep it on the rails as you progress. I suspect “vibe coding” can get a bad rap due to lack of testing. With AI coding I think test coverage gets more important.
Sure, eventually we'll have AGI, then no worries, but in the meantime you can only use the tools that exist today, and dreaming about what should be available in the future doesn't help.
I suspect that the timeline from autocomplete-one-line to autocomplete-one-app, which was basically a matter of scaling and RL, may in retrospect turn out to have been a lot faster that the next LLM to AGI step where it becomes capable of using human level judgement and reasoning, etc, to become a developer, not just a coding tool.
> But what if the next “person” isn’t a person?
There's certainly a hypothetical future where AI writes and maintains enterprise software and airline baggage control systems consisting of millions of lines of spaghetti code, codes that violates all the principles of good software design that we currently value, and everything turns out peachy.
Nobody but the AI understands the code, but it mostly works, and the AI fixes it when it doesn't. We lose the capability to understand and test the code ourselves, but the AI says "trust me bro - it's all been tested" (even though bugs keep turning up).
But, I doubt it.
First off, nevermind all the "parroting" nonsense, but LLMs are nonetheless auto-regressively trained and therefore fundamentally a copying technology, so as long at the LLM creators make an effort to train on high quality code, then what's generated should at least match those high quality patterns to some degree.
Secondly, human's hard-won best-practices for designing code are there for a reason, and it's not just because of the limits of our feeble minds to work with anything-goes spaghetti code. The reason why we prefer code that is modular, with thin/clean interfaces between modules, shortish functions, etc, are because these practices fundamentally do make for code that is easier to reason about, to test and debug, to update and extend without breakage, etc.
Per the Halting Problem, we know that ultimately the only way to know what code does is to run it, and therefore even if LLMs/AI were to exceed humans in general reasoning ability they would never gain some magical abililty to write arbitrarily complex/unstructured code and be able to successfully reason about what it is doing and it's correctness, etc. Following human best practices not only helps create code that is testable (possible to analyze and create test cases for all paths through a piece of code), but also code that can more easily be reasoned about, whether by man or machine.
In terms of where we are right now regarding AI's ability to write good quality code, it's perhaps informative to look at Claude Code, which is of late writing most of it's own code under the guidance of it's creator, and despite being something on the very simple end of the spectrum in terms of software complexity*, currently has about 5,000 issues filed against it on github.
* A minimal CLI agent is a few hundred lines of code vs for example the 15 million LOC of gcc.
reply