All these improvement in a single year, 2025. While this may seem obvious to tho...

wpietri · 2026-01-01T13:26:48 1767274008

This is not a great argument:

> But it is hard to argue against the value of current AI [...] it is getting $1B dollar runway already.

The psychic services industry makes over $2 billion a year in the US [1], with about a quarter of the population being actual believers. [2].

[1] The https://www.ibisworld.com/united-states/industry/psychic-ser...

[2] https://news.gallup.com/poll/692738/paranormal-phenomena-met...

apexalpha · 2026-01-01T13:36:19 1767274579

What if these provide actual value through placebo-effect?

wpietri · 2026-01-01T15:07:44 1767280064

I think we have different definitions of "actual value". But even if I pick the flaccid definition, that isn't proof of value of the thing itself, but of any placebo. In which case we can focus on the cheapest/least harmful placebo. Or, better, solving the underlying problem that the placebo "helps".

computably · 2026-01-01T16:02:31 1767283351

I'll preface by saying I fully agree that psychics aren't providing any non-placebo value to believers, although I think it's fine to provide entertainment for non-believers.

> Or, better, solving the underlying problem that the placebo "helps".

The underlying problems are often a lack of a decent education and a generally difficult/unsatisfying life. Systemic issues which can't be meaningfully "solved" without massive resources and political will.

wpietri · 2026-01-02T13:34:33 1767360873

If we look back over the last century or so, I think we've made excellent progress on that. The main current barrier is that we've lately let people with various pathologies run wild, but historically that creates enough problems that the political will emerges. See, e.g., the American and French revolutions, or India's independence, or the US civil war and Reconstruction.

jay_kyburz · 2026-01-01T19:03:44 1767294224

Actually, I'd go one step further and say they are harmful to everybody else.

It might just be my circles, but I've seen Carl Sagans quote everywhere in the last couple of months.

"“Science is more than a body of knowledge; it is a way of thinking. I have a foreboding of an America in my children’s or grandchildren’s time—when the United States is a service and information economy; when nearly all the key manufacturing industries have slipped away to other countries; when awesome technological powers are in the hands of a very few, and no one representing the public interest can even grasp the issues; when the people have lost the ability to set their own agendas or knowledgeably question those in authority; when, clutching our crystals and nervously consulting our horoscopes, our critical faculties in decline, unable to distinguish between what feels good and what’s true, we slide, almost without noticing, back into superstition and darkness.”"

recursive · 2026-01-01T13:59:50 1767275990

You talking about psychics or LLMs?

grosswait · 2026-01-01T14:17:42 1767277062

ctoth · 2026-01-01T17:06:40 1767287200

2022/2023: "It hallucinates, it's a toy, it's useless."

2024/2025: "Okay, it works, but it produces security vulnerabilities and makes junior devs lazy."

2026 (Current): "It is literally the same thing as a psychic scam."

Can we at least make predictions for 2027? What shall the cope be then! Lemme go ask my psychic.

wpietri · 2026-01-02T03:28:09 1767324489

I suppose it's appropriate that you hallucinated an argument I did not make, attacked the straw man, and declared victory.

ben_w · 2026-01-02T10:07:15 1767348435

Ironically, the human tendency to read far too much into things for which we have far too little data, does seem to still be one of the ways we (and all biological neural nets) are more sample-efficient than any machine learning.

I have no idea if those two points, ML and brains, are just different points on the same Pareto frontier of some useful metrics, but I am increasingly suspecting they might be.

bopbopbop7 · 2026-01-01T17:25:38 1767288338

2022/2023: "Next year software engineering is dead"

2024: "Now this time for real, software engineering is dead in 6 months, AI CEO said so"

2025: "I know a guy who knows a guy who built a startup with an LLM in 3 hours, software engineering is dead next year!"

What will be the cope for you this year?

ben_w · 2026-01-02T10:28:04 1767349684

I went from using ChatGPT 3.5 for functions and occasional scripts…

… to one of the models in Jan 2024 being able to repeatedly add features to the same single-page web app without corrupting its own work or hallucinating the APIs it had itself previously generated…

… to last month using a gifted free week of Claude Code to finish one project and then also have enough tokens left over to start another fresh project which, on that free left-over credit, reached a state that, while definitely not well engineered, was still better than some of the human-made pre-GenAI nonsense I've had to work with.

Wasn't 3 hours, and I won't be working on that thing more this month either because I am going to be doing intensive German language study with the goal of getting the language certificate I need for dual citizenship, but from the speed of work? 3 weeks to make a startup is already plausible.

I won't say that "software engineering" is dead. In a lot of cases however "writing code" is dead, and the job of the engineer should now be to do code review and to know what refactors to ask for.

bopbopbop7 · 2026-01-02T17:10:39 1767373839

So you did some basic web development and built a "not well engineered" greenfield app that you didn't ship, and from that your conclusion is that "writing code is dead"?

ben_w · 2026-01-02T20:21:55 1767385315

In half a week with left-over credit.

What do you think the first half of the credit was spent on?

In addition to the other projects it finished off for me, the reason I say "coding is dead" is that even this mediocre quality code is already shippable. Customers do not give a toss if it has clean code or nicely refactored python backend, that kind of thing is a pain point purely for developers, and when the LLM is the developer then the LLM is the one who gets to be ordered to pay down the technical debt.

The other project (and a third one I might have done on a previous free trial) are as complete as I care to make them. They're "done" in a way I'm not used to being possible with manual coding, because LLMs can finish features faster than I can think of new useful features to add. The limiting factor is my ability to do code review, or would be if I got the more expensive option, as I was on a free trial I could do code review about twice as fast as I burned through tokens (given what others say about the more expensive option that either means I need to learn to code review faster, or my risk tolerance is lower than theirs).

Now, is my new 3-day web app a viable business idea? It would've been shippable as-is 5-6 years ago, I saw worse live around then. Today? Hard to say, if markets were efficient then everyone would know LLMs can create this kind of thing so easily and nobody could charge for them, but people like yourself who disbelieve are an example of markets not being efficient, people like you can have apps like these sold to them.

That said, I try not to look at where the ball is but where it is going. For business ideas, I have to figure out what *doesn't* scale, and do that. Coding *does* scale now, that's why coding is dead.

I expect to return to this project in a month. Have one of the LLMs expand it and develop it for more than 3 the days spent so far, turn it into something I'd actually be happy to sell. Like I said, it seems like we're at "3 weeks" not "3 hours" for a decent MVP by current standards, but the floor is rising fast.

aspenmartin · 2026-01-01T20:15:30 1767298530

The cope + disappointment will be knowing that a large population of HN users will paint a weird alternative reality. There are a multitude of messages about AI that are out there, some are highly detached from reality (on the optimistic and pessimistic side). And then there is the rational middle, professionals who see the obvious value of coding agents in their workflow and use them extensively (or figure out how to best leverage them to get the most mileage). I don't see software engineering being "dead" ever, but the nature of the job _has already changed_ and will continue to change. Look at Sonnet 3.5 -> 3.7 -> 4.5 -> Opus 4.5; that was 17 months of development and the leaps in performance are quite impressive. You then have massive hardware buildouts and improvements to stack + a ton of R&D + competition to squeeze the juice out of the current paradigm (there are 4 orders of magnitude of scaling left before we hit real bottlenecks) and also push towards the next paradigm to solve things like continual learning. Some folks have opted not to use coding agents (and some folks like yourself seem to revel in strawmanning people who point out their demonstrable usefulness). Not using coding agents in Jan 2026 is defensible. It won't be defensible for long.

bopbopbop7 · 2026-01-01T20:25:58 1767299158

Please do provide some data for this "obvious value of coding agents". Because right now the only thing obvious is the increase in vulnerabilities, people claiming they are 10x more productive but aren't shipping anything, and some AI hype bloggers that fail to provide any quantitative proof.

aspenmartin · 2026-01-01T20:39:15 1767299955

Sure: at my MAANG company, where I watch the data closely on adoption of CC and other internal coding agent tools, most (significant) LOC are written by agents, and most employees have adopted coding agents as WAU, and the adoption rate is positively correlated with seniority.

Like a lot of things LLM related (Simon Willison's pelican test, researchers + product leaders implementing AI features) I also heavily "vibe" check the capabilities myself on real work tasks. The fact of the matter is I am able to dramatically speed up my work. It may be actually writing production code + helping me review it, or it may be tasks like: write me a script to diagnose this bug I have, or build me a streamlit dashboard to analyze + visualize this ad hoc data instead of me taking 1 hour to make visualizations + munge data in a notebook.

> people claiming they are 10x more productive but aren't shipping anything, and some AI hype bloggers that fail to provide any quantitative proof.

what would satisfy you here? I feel you are strawmanning a bit by picking the most hyperbolic statements and then blanketing that on everyone else.

My workflow is now:

- Write code exclusively with Claude

- Review the code myself + use Claude as a sort of review assistant to help me understand decisions about parts of the code I'm confused about

- Provide feedback to Claude to change / steer it away or towards approaches

- Give up when Claude is hopelessly lost

It takes a bit to get the hang of the right balance but in my personal experience (which I doubt you will take seriously but nevertheless): it is quite the game changer and that's coming from someone who would have laughed at the idea of a $200 coding agent subscription 1 year ago

Denzel · 2026-01-02T02:30:35 1767321035

We probably work at the same company, given you used MAANG instead of FAANG.

As one of the WAU (really DAU) you’re talking about, I want to call out a couple things: 1) the LOC metrics are flawed, and anyone using the agents knows this - eg, ask CC to rewrite the 1 commit you wrote into 5 different commits, now you have 5 100% AI-written commits; 2) total speed up across the entire dev lifecycle is far below 10x, most likely below 2x, but I don’t see any evidence of anyone measuring the counterfactuals to prove speed up anyways, so there’s no clear data; 3) look at token spend for power users, you might be surprised by how many SWE-years they’re spending.

Overall it’s unclear whether LLM-assisted coding is ROI-positive.

ben_w · 2026-01-02T10:53:32 1767351212

To add to your point:

If the M stands for Meta, I would also like to note that as a user, I have been seeing increasingly poor UI, of the sort I'd expect from people committing code that wasn't properly checked before going live, as I would expect from vibe coding in the original sense of "blindly accept without review". Like, some posts have two copies of the sender's name in the same location on screen with slightly different fonts going out of sync with each other.

I can easily believe the metrics that all [MF]AANG bonuses are denominated in are going up, our profession has had jokes about engineers gaming those metrics even back when our comics were still printed in books: https://imgur.com/bug-free-programs-dilbert-classic-tyXXh1d

aspenmartin · 2026-01-02T13:21:36 1767360096

Oh yes all of this I agree with. I had tried to clarify this above but your examples are clearer: my point is: all measures and studies I have personally seen of AI impact on productivity have been deeply flawed for one reason or another.

Total speed up is WAY less than 10x by any measure. 2x seems too high too.

By data alone it’s a bit unclear of impact I agree. But I will say there seems to be a clear picture that to me, starting from a prior formed from personal experience, indicates some real productivity impact today, with a trajectory that suggests these claims of a lot of SWE work being offloaded to agents over the next few years seems not that far fetched.

- adoption and retention numbers internally and externally. You can argue this is driven by perverse incentives and/or the perception performance mismatch but I’m highly skeptical of this even though the effects of both are probably really, it would be truly extraordinary to me if there weren’t at least a ~10-20% bump in productivity today and a lot of headroom to go as integration gets better and user skill gets better and model capabilities grow

- benchmark performance, again benchmarks are really problematic but there are a lot of them and all of them together paint a pretty clear picture of capabilities truly growing and growing quickly

- there are clearly biases we can think of that would cause us to overestimate AI impact, but there are also biases that may cause us to underestimate impact: e.g. I’m now able to do work that I would have never attempted before. Multitasking is easier. Experiments are quicker and easier. That may not be captured well by e.g. task completion time or other metrics.

I even agree: quality of agentic code can be a real risk, but:

- I think this ignores the fact that humans have also always written shitty code and always will; there is lots of garbage in production believe me, and that predates agentic code

- as models improve, they can correct earlier mistakes

- it’s also a muscle to grow: how to review and use humans in the loop to improve quality and set a high bar

Denzel · 2026-01-09T02:49:49 1767926989

Great response, we’re like 98% aligned at a high-level. :) These next few years will be interesting.

bopbopbop7 · 2026-01-01T20:54:53 1767300893

Anecdotes don’t prove anything, ones without any metrics, and especially at MAANG where AI use is strongly incentivized.

Evidence is peer reviewed research, or at least something with metrics. Like the METR study that shows that experienced engineers often got slower on real tasks with AI tools, even though they thought they were faster.

aspenmartin · 2026-01-01T22:16:11 1767305771

That's why I gave you data! METR study was 16 people using Sonnet 3.5/3.7. Data I'm talking about is 10s of thousands of people and is much more up to date.

Some counter examples to METR that are in the literature but I'll just say: "rigor" here is very difficult (including METR) because outcomes are high dimensional and nuanced, or ecological validity is an issue. It's hard to have any approach that someone wouldn't be able to dismiss due to some issue they have with the methodology. The sources below also have methodological problems just like METR

https://arxiv.org/pdf/2302.06590 -- 55% faster implementing HTTP server in javascript with copilot (in 2023!) but this is a single task and not really representative.

https://demirermert.github.io/Papers/Demirer_AI_productivity... -- "Though each experiment is noisy, when data is combined across three experiments and 4,867 developers, our analysis reveals a 26.08% increase (SE: 10.3%) in completed tasks among developers using the AI tool. Notably, less experienced developers had higher adoption rates and greater productivity gains." (but e.g. "completed tasks" as the outcome measure is of course problematic)

To me, internal company measures for large tech companies will be most reliable -- they are easiest to track and measure, the scale is large enough, and the talent + task pool is diverse (junior -> senior, different product areas, different types of tasks). But then outcome measures are always a problem...commits per developer per month? LOC? task completion time? all of them are highly problematic, especially because its reasonable to expect AI tools would change the bias and variance of the proxy so its never clear if you're measuring the change in "style" or the change in the underlying latent measure of productivity you care about

bopbopbop7 · 2026-01-01T22:38:44 1767307124

To be fair, I’ll take a non-biased 16 person study over “internal measures” from a MAANG company that burned 100s of billions on AI with no ROI that is now forcing its employees to use AI.

anorwell · 2026-01-02T02:04:56 1767319496

What do you think about the METR 50% task length results? About benchmark progress generally?

ben_w · 2026-01-02T11:02:33 1767351753

I don't speak for bopbopbop7, but I will say this: my experience of using Claude Code has been that it can do much longer tasks than the METR benchmark implies are possible.

The converse of this is that if those tasks are representative of software engineering as a whole, I would expect a lot of other tasks where it absolutely sucks.

This expectation is further supported by the number of times people pop up in conversations like this to say for any given LLM that it falls flat on its face even for something the poster thinks is simple, that it cost more time than it saved.

As with supposedly "full" self driving on Teslas, the anecdotes about the failure modes are much more interesting than the success: one person whose commute/coding problem happens to be easy, may mistake their own circumstances for normal. Until it does work everywhere, it doesn't work everywhere.

When I experiment with vibe coding (as in, properly unsupervised), it can break down large tasks into small ones and churn through each sub-task well enough, such that it can do a task I'd expect to take most of a sprint by itself. Now, that said, I will also say it seems to do these things a level of "that'll do" not "amazing!", but it does do them.

But I am very much aware this is like all the people posting "well my Tesla commute doesn't need any interventions!" in response to all the people pointing out how it's been a decade since Musk said "I think that within two years, you'll be able to summon your car from across the country. It will meet you wherever your phone is … and it will just automatically charge itself along the entire journey."

It works on my [use case], but we can't always ship my [use case].

aspenmartin · 2026-01-02T00:08:21 1767312501

I could have guessed you would say that :) but METR is not an unbiased study either. Maybe you mean that METR is less likely to intentionally inflate their numbers?

If you insist or believe in a conspiracy I don’t think there’s really anything I or others will be able to say or show you that would assuage you, all I can say is I’ve seen the raw data. It’s a mess and again we’re stuck with proxies (which are bad since you start conflating the change in the proxy-latent relationship with the treatment effect). And it’s also hard and arguably irresponsible to run RCTs.

All I will say is: there are flaws everywhere. METR results are far from conclusive. Totally understandable if there is a mismatch between perception and performance. But also consider: even if task takes the same or even slightly more time, one big advantage for me is that it substantially reduces cognitive load so I can work in parallel sessions on two completely different issues.

bopbopbop7 · 2026-01-02T00:28:24 1767313704

I bet it does reduce your cognitive load, considering you, in your own words "Give up when Claude is hopelessly lost". No better way to reduce cognitive load.

aspenmartin · 2026-01-02T01:03:22 1767315802

I give up using Claude when it gets hopelessly lost, and then my cognitive load increases.

Ianjit · 2026-01-01T23:59:26 1767311966

Meta internal study showed a 6-12% productivity uplift.

https://youtu.be/1OzxYK2-qsI?si=8Tew5BPhV2LhtOg0

insin · 2026-01-01T22:40:57 1767307257

> - Give up when Claude is hopelessly lost

You love to see "Maybe completely waste my time" as part of the normal flow for a productivity tool

aspenmartin · 2026-01-02T00:09:51 1767312591

That negates everything else? If you have a tool that can boost you for 80% of your work and for the other 20% you just have to do what you’re already doing, is that bad?

shimman · 2026-01-02T01:45:44 1767318344

There's a reason why sunk cost IS a fallacy and not a sound strategy.

Ianjit · 2026-01-01T23:56:35 1767311795

The productivity uplift is massive, Meta got a 6-12% productivity uplift from AI coding!

https://youtu.be/1OzxYK2-qsI?si=8Tew5BPhV2LhtOg0

ben_w · 2026-01-02T10:43:33 1767350613

> You then have massive hardware buildouts and improvements to stack + a ton of R&D + competition to squeeze the juice out of the current paradigm (there are 4 orders of magnitude of scaling left before we hit real bottlenecks)

This is a surprising claim. There's only 3 orders of magnitude between US data centre electricity consumption and worldwide primary energy (as in, not just electricity) production. Worldwide electricity supply is about 3/20ths of world primary energy, so without very rapid increases in electricity supply there's really only a little more than 2 orders of magnitude growth possible in compute.

Renewables are growing fast, but "fast" means "will approach 100% of current electricity demand by about 2032". Which trend is faster, growth of renewable electricity or growth of compute? Trick question, compute is always constrained by electricity supply, and renewable electricity is growing faster than anything else can right now.

aspenmartin · 2026-01-02T13:31:25 1767360685

This is not my own claim, it’s based on the following analysis from Epoch: https://epoch.ai/blog/can-ai-scaling-continue-through-2030

But I forgot how old that article is: it’s 4 orders of magnitude past GPT-4 in terms of total compute which is I think only 3.5 orders of magnitude from where we are today (based on 4.4x scaling/yr)

nsxwolf · 2026-01-01T20:28:04 1767299284

The nature of my job has always been fighting red tape, process, and stake holders to deploy very small units of code to production. AI really did not help with much of that for me in 2025.

I'd imagine I'm not the only one who has a similar situation. Until all those people and processes can be swept away in favor of letting LLMS YOLO everything into production, I don't see how that changes.

aspenmartin · 2026-01-01T20:41:12 1767300072

No I think that's extremely correct. I work at a MAANG where we have the resources to hook up custom internal LLMs and agents to actually deal with that but that is unique to an org of our scale.

jillesvangurp · 2026-01-01T09:37:53 1767260273

2025 was the year of development tool using AI agents. I think we'll shift attention to non development tool using AI agents. Most business users are still stuck using chat gpt as some kind of grand oracle that will write their email or powerpoint slides. There are bits and pieces of mostly technology demo level solutions but nothing that is widely used like AI coding tools are so far. I don't think this is bottle necked on model quality.

I don't need an AGI. I do need a secretary type agent that deals with all the simple but yet laborious non technical tasks that keep infringing on my quality engineering time. I'm CTO for a small startup and the amount of non technical bullshit that I need to deal with is enormous. Some examples of random crap I deal with: figuring out contracts, their meaning/implication to situations, and deciding on a course of action; Customer offers, price calculations, scraping invoices from emails and online SAAS accounts, formulating detailed replies to customer requests, HR legal work, corporate bureaucracy, financial planning, etc.

A lot of this stuff can be AI assisted (and we get a lot of value out of ai tools for this) but context engineering is taking up a non trivial amount of my time. Also most tools are completely useless at modifying structured documents. Refactoring a big code base, no problem. Adding structured text to an existing structured document, hardest thing ever. The state of the art here is an ff-ing sidebar that will suggest you a markdown formatted text that you might copy/paste. Tool quality is very primitive. And then you find yourself just stripping all formatting and reformatting it manually. Because the tools really suck at this.

arcatech · 2026-01-01T14:42:48 1767278568

> Some examples of random crap I deal with: figuring out contracts, their meaning/implication to situations, and deciding on a course of action

This doesn’t sound like bullshit you should hand off to an AI. It sounds like stuff you would care about.

jillesvangurp · 2026-01-01T15:48:14 1767282494

I do care about it; kind of my duty as a co-founder. Which is why I'm spending double digit percentages of my time doing this stuff. But I absolutely could use some tools to cut down on a lot of the drudgery that is involved with this. And me reading through 40 pages of dense legal German isn't one of my strengths since I 1) do not speak German 2) am not a lawyer and 3) am not necessarily deeply familiar with all the bureaucracy, laws, etc.

But I can ask intelligent questions about that contract from an LLM (in English) and shoot back and forth a few things, come up with some kind of action plan, and then run it by our laywers and other advisors.

That's not some kind of hypothetical thing. That's something that happened multiple times in our company in the last few months. LLMs are very empowering for dealing with this sort of thing. You still need experts for some stuff. But you can do a lot more yourself now. And as we've found out, some of the "experts" that we relied on in the past actually did a pretty shoddy job. A lot of this stuff was about picking apart the mess they made and fixing it.

As soon as you start drafting contracts, it gets a lot harder. I just went through a process like that as well. It involves a lot of manual work that is basically about formatting documents, drafting text, running pdfs and text snippets through chat gpt for feedback, sparring, criticism, etc. and iterating on that. This is not about vibe coding some contract but making sure every letter of a contract is right. That ultimately involves lawyers and negotiating with other stakeholders but it helps if you come prepared with a more or less ready to sign off on document.

It's not about handing stuff off but about making LLMs work for you. Just like with coding tools. I care about code quality as well. But I still use the tools to save me a lot of time.

simonw · 2026-01-01T16:07:49 1767283669

One of the lessons I learned running a startup is that it doesn't matter how good the professionals you hire are for things like legal and accounting, you still need to put work in yourself.

Everyone makes mistakes and misses things, and as the co-founder you have to care more about the details than anyone else does.

I would have loved to have weird-unreliable-paralegal-Claude available back when I was doing that!

nrclark · 2026-01-01T14:47:41 1767278861

Agree. Even asking it can anchor your thinking.

topaztee · 2026-01-01T15:55:00 1767282900

`Also most tools are completely useless at modifying structured documents`

we built a tool for this for the life science space and are opening it up to the general public very soon. Email me I can give you access (topaz at vespper dot com)

jennyholzer3 · 2026-01-02T14:16:09 1767363369

you don't need AGI, you need human labor

utopiah · 2026-01-01T08:20:36 1767255636

> All these improvement in a single year

> hard to argue against the value of current AI

> People are willing to pay $200 per month, and it is getting $1B dollar runway already.

Those are 3 different things. There can be a LOT of fast and significant improvements but still remain extremely far from the actual goal, so far it looks like actually little progress.

People pay for a lot of things, including snake oil, so convincing a lot of people to pay a bit is not in itself a proof of value, especially when some people are basically coerced into this, see how many companies changed their "strategy" to mandating AI usage internally, or integration for a captive audience e.g. Copilot.

Finally yes, $1B is a LOT of money for you and I... but for the largest corporations it's actually not a lot. For reference Google earned that in revenue... per day in 2023. Anyway that's still a big number BUT it still has to be compared with, well how much does OpenAI burn. I don't have any public number on that but I believe the consensus is that it's a lot. So until we know that number we can't talk about an actual runway.

aspenmartin · 2026-01-01T20:23:06 1767298986

> People pay for a lot of things, including snake oil, so convincing a lot of people to pay a bit is not in itself a proof of value

But do you really believe e.g. Claude code is snake oil? I pay $200 / month for Claude, which is something I would have thought monumentally insane maybe 1-2 years ago (e.g. when ChatGPT came out with their premium subscription price I thought that seemed so out of touch). I don't think we would be seeing the subscription rates and the retention numbers if it really was snake oil.

> Finally yes, $1B is a LOT of money for you and I... but for the largest corporations it's actually not a lot. For reference Google earned that in revenue... per day in 2023. Anyway that's still a big number BUT it still has to be compared with, well how much does OpenAI burn. I don't have any public number on that but I believe the consensus is that it's a lot. So until we know that number we can't talk about an actual runway.

this gets brought up a lot but I'm not sure I understand why folks on a forum called YCombinator, a startup accelerator, would make this sound like an obvious sign of charlatanism; operating at a loss is nothing new and anthropic / openAI strategy seems perfectly rational: they are scaling and capturing market share, and TAM is insane.

jimmaswell · 2026-01-03T05:44:24 1767419064

> many companies changed their "strategy" to mandating AI usage internally

Are they hiring? My job is still dragging its feet on approving copilot.

pjc50 · 2026-01-01T09:39:03 1767260343

Investing a trillion dollars for a revenue of a billion dollars doesn't sound great yet.

signatoremo · 2026-01-02T08:03:11 1767340991

Companies benefiting from trillion dollars spent during the dotcom era certainly make more than a billion dollars, for the last 20 years.

Intellectual dishonesty is certainly rampant on HN.

steveBK123 · 2026-01-01T14:11:55 1767276715

Indeed, its the old Uber playbook at nearly two extra orders of magnitude.

It is a large enough number to simply run out of private capital to consume before it turns cash flow positive.

Lots of things sell well if sold at such a loss. I’d take a new Ferrari for $2500 if it was on offer.

pjc50 · 2026-01-01T19:34:11 1767296051

Did Uber actually do a lot of capital investment? They don't own the cars, for example.

shimman · 2026-01-02T01:46:45 1767318405

Uber nakedly broke the law and beat down labor, I'm honestly shocked none of the executives went to prison.

derektank · 2026-01-02T23:08:17 1767395297

Uber didn’t beat down labor, they beat down capital, specifically the capital that owned (and lobbied for the existence of) taxi medallions

shimman · 2026-01-03T18:28:25 1767464905

No, you clearly have never talked to the workers at Uber (no not the devs, the drivers). Uber has disgustingly fought against unionization efforts, employee benefit efforts, and increasing wages. Such actions do not make you a good company, especially when the executives make millions while fighting against workers wanting a better life.

They are an evil company and the rot has been there since inception. This isn't even getting into their disgusting internal predator culture against women either.

simonw · 2026-01-01T19:55:45 1767297345

I believe they spent a huge amount of money on incentives to help sign up drivers, and discounts to help attract customers.

pjc50 · 2026-01-02T01:08:16 1767316096

Yes, but that's loss leader rather than capital investment. You can't put a customer on the balance sheet and depreciate them. Once you've paid for a free ride, you own nothing tangible.

aoeusnth1 · 2026-01-01T18:56:12 1767293772

You say that as if Uber's playbook didn't work. Try this: https://www.google.com/finance/quote/UBER:NYSE

derwiki · 2026-01-01T14:56:02 1767279362

Uber’s playbook worked for Uber

coffeebeqn · 2026-01-01T08:03:40 1767254620

Seems like Nvidia will be focusing on the super beefy GPUs and leaving the consumer market to a smaller player

Flow · 2026-01-01T09:55:34 1767261334

I don't get why Nvidia can't do both? Is it because of the limited production capabilities of the factories?

ACCount37 · 2026-01-01T10:20:22 1767262822

Yes. If you're bottlenecked on silicon and secondaries like memory, why would you want to put more of those resources into lower margin consumer products if you could use those very resources to make and sell more high margin AI accelerators instead?

From a business standpoint, it makes some sense to throttle the gaming supply some. Not to the point of surrendering the market to someone else probably, but to a measurable degree.

ksec · 2026-01-01T12:42:57 1767271377

We will have to wait and see but my bet is that Nvidia will move to Leading Edge node N2 earlier now they have the Margin to work with. Both Hopper and Blackwell were too late in the design cycle. The AI hype and continue to buy the latest and great leaving Gaming at a mainstream node.

Nvidia using Mainstream node has always been the norm considering most Fab capacity always goes to Mobile SoC first. But I expect the internet / gamers will be angry anyway because Nvidia does not provide them with the latest and greatest.

In reality the extra R&D cost for designing with leading edge will be amortised by all the AI order which give Nvidia competitive advantage at the consumer level when they compete. That is assuming there are competition because most recent data have shown Nvidia owning 90%+ of discreet market share, 9% for AMD and 1% for Intel.

_s · 2026-01-01T08:20:18 1767255618

AMD owns a lot of the consumer market already; handhelds, consoles, desktop rigs and mobile ... they are not a small player.

ac29 · 2026-01-02T09:05:51 1767344751

Intel's client computing revenue was greater than AMD's entire revenue last quarter

utopiah · 2026-01-01T08:21:28 1767255688

They said "smaller" not small.

HumblyTossed · 2026-01-01T19:05:29 1767294329

It's a great tool, but right now it's only being used to feed the greed.

>> Again, I guess no one knew AI would be as big as it is today, and it is only just started.

People have been saying similar about self driving cars for years now. "AI" is another one of those expensive ideas that we'll get 85% of the way there and then to get the other 15% will be way more expensive than anyone will want to pay for. It's already happening - HW prices and electricity - people are starting to ask, "if I put more $ into this machine, when am I actually going to start getting money out?" The "true believers" are like, soon! But people are right to be hugely skeptical.

jliptzin · 2026-01-01T19:45:28 1767296728

There are some things it's really great at. For example, handling a css layout. If we have to spend trillions of dollars and get nothing else out of it other than being able to vertically center a <div> without wrestling with css and wanting to smash the keyboard in the process, it will all have been worth it.

falkensmaize · 2026-01-02T05:15:42 1767330942

Not to be cheeky, but isn’t this just

display: flex; align-items:center;

now?

aspenmartin · 2026-01-01T20:28:23 1767299303

I agree -- skepticism is totally healthy. And there are so many great ways to poke holes in the true underlying narratives (not the headlines that people seem to pull from). E.g. evaluation science is a wasteland (not for wont of very smart people trying very hard to get them right). How do we tackle the power requirements in a way that is sustainable? Etc. etc.

But stuff like this im not sure I understand:

> It's a great tool, but right now it's only being used to feed the greed.

if its a great tool, then how is it _only_ being used to "feed the greed" and what do you mean by that?

Also I think folks are quick to make analogies to other points in history: "AI is like the dot com boom we're going to crash and burn" and "AI is like {self driving cars, crypto, etc} and the promises will all be broken, its all hype" but this removes the nuance: all of these things are extremely different with very specific dynamics that in _some_ ways may be similar but in many crucial and important ways are completely different.

HumblyTossed · 2026-01-02T00:49:07 1767314947

>> if its a great tool, then how is it _only_ being used to "feed the greed" and what do you mean by that?

Look around?

aspenmartin · 2026-01-02T01:02:20 1767315740

Very confused, I still don’t know what you mean at all

signatoremo · 2026-01-02T08:06:30 1767341190

You meant someone using Claude Code is greedy?

layer8 · 2026-01-01T22:13:41 1767305621

> Every single part of the hardware stack are being fused with money and demand. The last time we have this was Post-PC / Smartphone era which drove the hardware industry forward for 10 - 15 years. The current AI can at least push hardware for another 5 - 6 years while pulling forward tech that was initially 8 - 10 years away.

It’s very unclear how much end-consumer hardware and DIY builders will benefit from that, as opposed to server-grade hardware that only makes sense for the enterprise marker. It could have the opposite effect, like hardware manufacturers leaving the consumer market (as in the case of Micron), because there’s just not that much money in it.

chias · 2026-01-01T08:31:40 1767256300

These are not all improvements. Listed:

* The year of YOLO and the Normalization of Deviance

* The year that Llama lost its way

* The year of alarmingly AI-enabled browsers

* The year of the lethal trifecta

* The year of slop

* The year that data centers got extremely unpopular

Y_Y · 2026-01-01T18:33:14 1767292394

Not that YOLO, PJ Reddie released that in 2015

mbesto · 2026-01-01T15:26:09 1767281169

Said differently - the year we start to see all of the externalities of a globally scaled hyped tech trend.

steveBK123 · 2026-01-01T14:16:39 1767276999

> * The year that data centers got extremely unpopular

I was discussing the political angle with a friend recently. I think Big Tech Bro / VC complex has done themselves a big disservice by aligning so tightly with MAGA to the point AI will be a political issue in 2026 & 2028.

Think about the message they’ve inadvertently created themselves - AI is going to replace jobs, it’s pushing electric prices up, we need the government to bail us out AND give us a regulatory light touch.

Super easy campaign for Dems - big tech trumpers are taking your money, your jobs, causing inflation, and now they want bailouts !!

Atomic_Torrfisk · 2026-01-01T16:06:39 1767283599

> People are willing to pay $200 per month

Some people are of course, but how many?

> ... People are willing to pay $200 per month

This is just low-key hype. Careful with your portfolio...

ACCount37 · 2026-01-01T10:21:24 1767262884

Is the AI progress in 2025 an outstanding breakthrough? Not really. It's impressive but incremental.

Still, the gap between the capabilities of a cutting edge LLM and that of a human is only this wide. There are only this many increments it takes to cross it.

belter · 2026-01-01T19:26:12 1767295572

>> But it is hard to argue against the value of current AI, which many of the vocal critics on HN seems to have the opinion of.

What is the concrete business case? Can anyone point to a revenue producing company using AI in production, and where AI is a material driver of profits?

Tool vendors don’t count. I’m not interested in how much money is being made selling shovels...show me a miner who actually struck gold please.

tim333 · 2026-01-02T00:13:13 1767312793

A lot of programmers seem willing to pay for the likes of Claude Code, presumably because it helps them get more done. Programmers cost money so that's a potential cost saving?

tstrimple · 2026-01-01T10:02:12 1767261732

[flagged]

cherryteastain · 2026-01-01T10:54:36 1767264876

Sam Altman [1] certainly seems to talk about AGI quite a bit

[1] https://blog.samaltman.com/reflections

ACCount37 · 2026-01-01T10:30:14 1767263414

Honestly, I wouldn't be surprised if a system that's an LLM at its core can attain AGI. With nothing but incremental advances in architecture, scaffolding, training and raw scale.

Mostly the training. I put less and less weight on "LLMs are fundamentally flawed" and more and more of it on "you're training them wrong". Too many "fundamental limitations" of LLMs are ones you can move the needle on with better training alone.

The foundation of LLM is flexible and capable, and the list of "capabilities that are exclusive to human mind" is ever shrinking.

tim333 · 2026-01-01T15:27:48 1767281268

They seem to be missing a bit on learning as you go and thinking about things and getting new insights.

ACCount37 · 2026-01-01T23:03:32 1767308612

In-context learning and reasoning cover that already, and you could expand on that. Nothing prevents an LLM from fine-tuning itself either, other than its own questionable fine tuning skills and the compute budget.

HarHarVeryFunny · 2026-01-01T15:28:18 1767281298

That depends on how you define AGI - it's a meaningless term to use since everyone uses it to mean different things. What exactly do you mean ?!

Yes, there is a lot that can be improved via different training, but at what point is it no longer a language model (i.e. something that auto-regressively predicts language continuations)?

I like to use an analogy to the children's "Stone Soup" story whereby a "stone soup" (starting off as a stone in a pot of boiling water) gets transformed into a tasty soup/stew by strangers incrementally adding extra ingredients to "improve the flavor" - first a carrot, then a bit of beef, etc. At what point do you accept that the resulting tasty soup is not in fact stone soup?! It's like taking an auto-regressively SGD-trained Transformer, and incrementally tweaking the architecture, training algorithm, training objective, etc, etc. At some point it becomes a bit perverse to choose to still call it a language model

Some of the "it's just training" changes that would be needed to make today's LLMs more brain-like may be things like changing the training objective completely from auto-regressive to predicting external events (with the goal of having it be able to learn the outcomes of it's own actions, in order to be able to plan them), which to be useful would require the "LLM" to then be autonomous and act in some (real/virtual) world in order to learn.

Another "it's just training" change would be to replace pre/mid/post-training with continual/incremental runtime learning to again make the model more brain-like and able to learn from it's own autonomous exploration of behavior/action and environment. This is a far more profound, and ambitious, change than just fudging incremental knowledge acquisition for some semblance of "on the job" learning (which is what the AI companies are currently working on).

If you put these two "it's just training/learning" enhancements together then you've now got something much more animal/human-like, and much more capable than an LLM, but it's already far from a language model - something that passively predicts next word every time you push the "generate next word" button. This would now be an autonomous agent, learning how to act and control/exploit the world around it. The whole pre-trained, same-for-everyone, model running in the cloud, would then be radically different - every model instance is then more like an individual learning based on it's own experience, and maybe you're now paying for compute for the continual learning compute rather than just "LLM tokens generated".

These are "just" training (and deployment!) changes, but to more closely approach human capability (but again, what to you mean by "AGI"?) there would also need to be architectural changes and additions to the "Transformer" architecture (add looping, internal memory, etc), depending on exactly how close you want to get to human/animal capability.

ACCount37 · 2026-01-02T11:23:55 1767353035

> which to be useful would require the "LLM" to then be autonomous and act in some (real/virtual) world in order to learn.

You described modern RLVR for tasks like coding. Plug an LLM into a virtual env with a task. Drill it based on task completion. Force it to get better at problem-solving.

It's still an autoregressive next token prediction engine. 100% LLM, zero architectural changes. We just moved it past pure imitation learning and towards something else.

HarHarVeryFunny · 2026-01-02T13:34:01 1767360841

Yes, if all you did was replace current pre/mid/post training with a new (elusive holy grail) runtime continual learning algorithm, then it would definitely still just be a language model. You seem to be talking about it having TWO runtime continual learning algorithms, next-token and long-horizon RL, but of course RL is part of what we're calling an LLM.

It's not obvious if you just did this without changing the learning objective from self-prediction (auto-regressive) to external prediction whether you'd actually gain much capability though. Auto-regressive training is what makes LLMs imitators - always trying to do same as before.

In fact, if you did just let a continual learner autonomously loose in some virtual environment, why would you expect it do do anything different, other than continual learning from whatever it was exposed to in the environment, from putting a current LLM in a loop, together with tool use as a way to expose it to new data? An imitative (auto-regressive) LLM doesn't have any drive to do anything new - if you just keep feeding it's own output back in as an input, then it's basically just a dynamical system that will eventually settle down into some attractor states representing the closure of the patterns it has learnt and is generating.

If you want the model to behave in a more human/animal-like self-motivated agentic fashion, then I think the focus has to be on learning how to act to control and take advantage of the semi-predictable environment, which is going to be based on having predicting the environment as the learning objective (vs auto-regressive), plus some innate drives (curiosity, boredom, etc) to bias behavior to maximize learning and creative discovery.

Continual learning also isn't going to magically solve the RL reward problem (how do you define and measure RL rewards in the general, non-math/programming, case?). In fact post-training is a very human-curated affair since humans have identified math and programming as tasks where this works and have created these problem-specific rewards. If you wanted the model to discover it's own rewards at runtime, as part of your new runtime RL algorithm perhaps, then you'd have to figure how to bake that into the architecture.

ACCount37 · 2026-01-02T18:07:57 1767377277

No. There are no architectural changes and no "second runtime learning algorithm". There's just the good old in-context learning that all LLMs get from pre-training. RLVR is a training stage that pressures the LLM to take advantage of it on real tasks.

"Runtime continual learning algorithm" is an elusive target of questionable desirability - given that we already have in-context learning, and "get better at SFT and RLVR lmao" is relatively simple to pull off and gives kickass gains in the here and now.

I see no reason why "behave in a more human/animal-like self-motivated agentic fashion" can't be obtained from more RLVR, if that's what you want to train your LLMs for.

HarHarVeryFunny · 2026-01-03T00:37:09 1767400629

I'm not sure what you are saying. There are LLMs as exist today, and there are any number of changes one could propose to make to them.

The less you change, the more they stay the same. If you just add "more" RLVR (perhaps for a new domain - maybe chemistry vs math or programming?), then all you will get is an LLM that is better at acing chemistry reasoning benchmarks.

ACCount37 · 2026-01-03T07:46:54 1767426414

I'm saying that the kind of changes you propose aren't made by anyone, and might generally not be worth making. Because "better RLVR" is an easier and better pathway to actual cross-domain performance gains.

If you could stabilize the kind of mess you want to make, you could put that effort into better RL objectives and get more return.

HarHarVeryFunny · 2026-01-03T14:39:19 1767451159

The mainstream LLM crowd aren't making these sorts of major changes yet, although some like DeepMind (the OG pushers of RL for AGI!) do acknowledge that a few more "transformer level" breakthoughs are necessary to reach what THEY are calling AGI, and others like LeCun are calling for more animal-like architectures.

Anyways, regardless of who is currently trying to move beyond LLMs or not, it should be pretty obvious what the problems are with trying to apply RL more generally, and what that would result in if successful, if that were the only change you made.

LLMs still have room to get better, but they will forever be LLMs, not brains, unless someone puts in the work to make that happen.

You started this thread talking about "AGI", without defining what you meant by that, and are now instead talking about "cross-domain performance gains". This is exactly why it makes no sense to talk about AGI without defining what you mean by it, since I think we talking about completely different things.

ACCount37 · 2026-01-03T15:43:34 1767455014

The claim I make is that LLMs can be AGI complete with pretty much zero architectural work. And none of the "brain-like" architectures are actually much better at "being a brain" than LLMs are - the issue isn't "the architecture is wrong", it's "we don't know how to train for this kind of thing".

"Fundamental limitations" aren't actually fundamental. If you want more learning than what "in-context" gives you? Teach the usual "CLI agent" LLM to make its own LoRAs and there goes that. So far, this isn't a bottleneck so pressing you'd want to resolve it by force.

LeCun is laughing stock nowadays, he didn't get kicked out of Meta for no reason.

HarHarVeryFunny · 2026-01-03T18:22:05 1767464525

You keep using the term "AGI" without defining what you mean by it, other than implicity defining it as "whatever can be achieved without changing the Transformer architecture", which makes your "claim" just a definitional tautology, which is fine, but it does mean you are talking about something different than what I am talking about, which is also fine.

> And none of the "brain-like" architectures are actually much better at "being a brain" than LLMs are

I've no idea what projects you are referring to.

It would certainly be bizarre if the Transformer architecture, never designed to be a brain, turns out to be the best brain we can come up with, and equal to real brains which have many more moving parts, each evolved over millions of years to fill a need and improve capability.

Maybe you are smarter than Demis Hassabis, and the DeepMind team, and all their work towards AGI (their version, not yours) will be a waste of effort. Why not send him a note "hey, dumbass, Transfomers are all you need!" ?

ACCount37 · 2026-01-04T06:38:30 1767508710

It would be certainly be bizarre if the 8086 architecture, never designed to be a foundation of all home, office and server computation, was the best CPU architecture ever made.

And it isn't. It's merely good enough.

That's what LLMs are. A "good enough" AI architecture.

By "AGI", I mean the good old "human equivalence" proxy. An AI that can accomplish any intellectual task that can be accomplished by a human. LLMs are probably sufficient for that. They have limitations, but not the kind that can't be worked around with things like sharply applied tool use. Which LLMs can be trained for, and are.

So far, all the weirdo architectures that try to replace transformers, or put brain-inspired features into transformers, have failed to live up to the promise. Which sure hints that the bottleneck isn't architectural at all.

HarHarVeryFunny · 2026-01-04T17:14:04 1767546844

I'm not aware of any architectures that have tried to put "brain-inspired" features into Transformers, or much attempt to modify them at all for that matter.

The architectural Transformer tweaks that we've seen are:

- Various versions of attention for greater efficiency

- MOE vs dense for greater efficiency

- Mamba (SSM) + transformer hybrid for greater efficiency

None of these are even trying to fundamentally change what the Transformer is doing.

Yeah, the x86 architecture is certainly a bit of a mess, but as you say good enough, as long as what you want to do is run good old fashioned symbolic computer programs. However, if you want to run these new-fangled neural nets, then you'd be better off with a GPU or TPU.

> By "AGI", I mean the good old "human equivalence" proxy. An AI that can accomplish any intellectual task that can be accomplished by a human. LLMs are probably sufficient for that.

I think DeepMind are right here, and you're wrong, but let's wait another year or two and see, eh?