> I don't understand why Hacker News is so dismissive about the coming of LLMs I...

aspenmartin · 2026-01-01T20:31:39 1767299499

I'm not sure I understand: we are _objectively on that path_ -- we are increasing exponentially on a number of metrics that may be imperfect but seem to paint a pretty consistent picture. Scaling laws are exponential. METR's time horizon benchmark is exponential. Lots of performance measures are exponential, so why do you say we're objectively not on that path?

> We might get some incremental improvement, but we’re very clearly seeing sigmoid progress.

again, if it is "very clear" can you point to some concrete examples to illustrate what you mean?

> I can’t speak for everyone, but I’m tired of hyperbolic rants that are unquestionably not justified (the nice thing about exponential progress is you don’t need to argue about it)

OK but what specifically do you have an issue with here?

viraptor · 2026-01-01T04:56:46 1767243406

> exponential progress

First you need to define what it means. What's the metric? Otherwise it's very much something you can argue about.

nicbou · 2026-01-01T14:00:55 1767276055

Time spent being human and enjoying life.

I can’t point at many problems it has meaningfully solved for me. I mean real problems , not tasks that I have to do for my employer. It seems like it just made parts of my existence more miserable, poisoned many of the things I love, and generally made the future feel a lot less certain.

noodletheworld · 2026-01-01T08:15:07 1767255307

> What's the metric?

Language model capability at generating text output.

The model progress this year has been a lot of:

- “We added multimodal”

- “We added a lot of non AI tooling” (ie agents)

- “We put more compute into inference” (ie thinking mode)

So yes, there is still rapid progress, but these ^ make it clear, at least to me, that next gen models are significantly harder to build.

Simultaneously we see a distinct narrowing between players (openai, deepseek, mistral, google, anthropic) in their offerings.

Thats usually a signal that the rate of progress is slowing.

Remind me what was so great about gpt 5? How about gpt4 from from gpt 3?

Do you even remember the releases? Yeah. I dont. I had to look it up.

Just another model with more or less the same capabilities.

“Mixed reception”

That is not what exponential progress looks like, by any measure.

The progress this year has been in the tooling around the models, smaller faster models with similar capabilities. Multimodal add ons that no one asked for, because its easier to add image and audio processing than improve text handling.

That may still be on a path to AGI, but it not an exponential path to it.

dragonwriter · 2026-01-01T08:35:00 1767256500

> Language model capability at generating text output.

That's not a metric, that's a vague non-operationalized concept, that could be operationalized into an infinite number of different metrics. And an improvement that was linear in one of those possible metrics would be exponential in another one (well, actually, one that is was linear in one would also be linear in an infinite number of others, as well as being exponential in an infinite number of others.

That’s why you have to define an actual metric, not simply describe a vague concept of a kind of capacity of interest, before you can meaningfully discuss whether improvement is exponential. Because the answer is necessarily entirely dependent on the specific construction of the metric.

threethirtytwo · 2026-01-01T17:25:52 1767288352

I don’t think the path was ever exponential but your claim here is almost as if the slow down hit an asymptote like wall.

Most of the improvements are intangible. Can we truly say how much more reliable the models are? We barely have quantitative measurements on this so it’s all vibes and feels. We don’t even have a baseline metric for what AGI is and we invalidated the Turing test also based on vibes and feels.

So my argument is that part of the slow down is in itself an hallucination because the improvement is not actually measurable or definable outside of vibes.

aspenmartin · 2026-01-02T14:08:55 1767362935

I kind of agree in principle but there are a multitude of clever benchmarks that try to measure lots of different aspects like robustness, knowledge, understanding, hallucinations, tool use effectiveness, coding performance, multimodal reasoning and generation, etc etc etc. all of these have lots of limitations but they all paint a pretty compelling picture that compliments the “vibes” which are also important.

aoeusnth1 · 2026-01-01T17:35:24 1767288924

> Language model capability at generating text output.

How would you put this on a graph?

viraptor · 2026-01-01T09:06:16 1767258376

> Language model capability at generating text output.

That's not a quantifiable sentence. Unless you put it in numbers, anyone can argue exponential/not.

> next gen models are significantly harder to build.

That's not how we judge capability progress though.

> Remind me what was so great about gpt 5? How about gpt4 from from gpt 3?

> Do you even remember the releases?

At gpt 3 level we could generate some reasonable code blocks / tiny features. (An example shown around at the time was "explain what this function does" for a "fib(n)") At gpt 4, we could build features and tiny apps. At gpt 5, you can often one-shot build whole apps from a vague description. The difference between them is massive for coding capabilities. Sorry, but if you can't remember that massive change... why are you making claims about the progress in capabilities?

> Multimodal add ons that no one asked for

Not only does multimodal input training improve the model overall, it's useful for (for example) feeding back screenshots during development.

aspenmartin · 2026-01-02T14:06:51 1767362811

Exactly, gpt5 was unimpressive not because of its leap from GPT4 but because of expectations based on the string of releases since GPT4 (especially the reasoning models). The leap from 4->5 was actually massive.

aspenmartin · 2026-01-02T14:23:48 1767363828

Next gen models are always hard to build, they are by definition pushing the frontier. Every generation of CPU was hard to build but we still had Moores law.

> Simultaneously we see a distinct narrowing between players (openai, deepseek, mistral, google, anthropic) in their offerings. Thats usually a signal that the rate of progress is slowing.

I agree with you on the fact in the first part but not the second part…why would convergence of performance indicate anything about the absolute performance improvements of frontier models?

> Remind me what was so great about gpt 5? How about gpt4 from from gpt 3? Do you even remember the releases? Yeah. I dont. I had to look it up.

3 -> 4 -> 5 were extraordinary leaps…not sure how one would be able to say anything else

> Just another model with more or less the same capabilities.

5 is absolutely not a model with more or less the same capabilities as gpt 4, what could you mean by this?

> “Mixed reception”

A mixed reception is an indication of model performance against a backdrop of market expectations, not against gpt 4…

> That is not what exponential progress looks like, by any measure.

Sure it is…exponential is a constant % improvement per year. We’re absolutely in that regime by a lot of measures

> The progress this year has been in the tooling around the models, smaller faster

Effective tool use is not somehow some trivial add on it is a core capability for which we are on an exponential progress curve.

> models with similar capabilities. Multimodal add ons that no one asked for, because its easier to add image and audio processing than improve text handling.

This is definitely a personal feeling of yours, multimodal models are not something no one asked for…they are absolutely essential. Text data is essential and data curation is non trivial and continually improving, we are also hitting the ceiling of internet text data. But yet we use an incredible amount of synthetic data for RL and this continues to grow……you guessed it, exponentially. and multimodal data is incredibly information rich. Adding multi modality lifts all boats and provides core capabilities necessary for open world reasoning and even better text data (e.g. understanding charts and image context for text).

noodletheworld · 2026-01-03T00:31:56 1767400316

> exponential is a constant % improvement per year

I suppose of you pick a low enough exponent then the exp graph is flat for a long time and you're right, zero progress is “exponential” if you cherry pick your growth rate to be low enough.

Generally though, people understand “exponential growth” as “getting better/bigger faster and faster in an obvious way”

> 3 -> 4 -> 5 were extraordinary leaps…not sure how one would be able to say anything else

They objectively were not.

The metrics and reception to them was very clear and overwhelming.

Youre spitting some meaningless revisionist BS here.

Youre wrong.

Thats all there is to it.

aspenmartin · 2026-01-03T00:59:34 1767401974

Doesn’t sound like you really seem to be interested in any sort of rational dialogue, metrics were “objectively” not better? What are you talking about of course they were have you even looked at benchmark progression for every benchmark we have?

You don’t understand what an exponential is or apparently what the benchmark numbers even are or possibly even how we actually measure model performance and the very real challenges and nuances involved but yet I’m “spitting some revisionist BS”. You have cited zero sources and are calling measured numbers “revisionist”.

You are also citing reception to models as some sort of indication of their performance, which is yet another confusing part of your reasoning.

I do agree that “metrics were were very clear” it just seems you don’t happen to understand what they are or what they mean.

scotty79 · 2026-01-01T11:16:57 1767266217

Define it however you like. There's not a single chart you can draw that even begins to look like a signoid.

tim333 · 2026-01-02T00:26:35 1767313595

>following along the last few years the promise was for “exponential progress”

I've been following for many years and the main exponential thing has been the Moore's law like growth in compute. Compute per dollar is probably the best tracking one and has done a steady doubling every couple of years or so for decades. It's exponential but quite a leisurely exponential.

The recent hype of the last couple of years is more dot com bubble like and going ahead of trend but will quite likely drop back.

senordevnyc · 2026-01-01T19:23:30 1767295410

I’ve been reading this comment multiple times a week for the last couple years. Constant assertions that we’re starting to hit limits, plateau, etc. But a cursory glance at where we are today vs a year ago, let alone two years ago, makes it wildly obvious that this is bullshit. The pace of improvement of both models and tooling has been breathtaking. I could give a shit whether you think it’s “exponential”, people like you were dismissing all of this years ago, meanwhile I just keep getting more and more productive.

qualifck · 2026-01-01T21:49:15 1767304155

People keep saying stuff like this. That the improvements are so obvious and breathtaking and astronomical and then I go check out the frontier LLMs again and they're maybe a tiny bit better than they were last year but I can't actually be sure bcuz it's hard to tell.

sometimes it seems like people are just living in another timeline.

aspenmartin · 2026-01-02T20:54:59 1767387299

You might want to be more specific because benchmarks abound and they paint a pretty consistent picture. LMArena "vibes" paint another picture. I don't know what you are doing to "check" the frontier LLMs but whatever you're doing doesn't seem to match more careful measurement...

You don't actually have to take peoples word for it, read epoch.ai developments, look into the benchmark literature, look at ARC-AGI...

qualifck · 2026-01-05T16:20:10 1767630010

That's half the problem though. I can see benchmarks. I can see number go up on some chart or that the AI scores higher on some niche math or programming test, but those results don't seem to actually connect much to meaningful improvements in daily usage of the software when those updates hit the public.

That's where the skepticism comes in, because one side of the discussion is hyping up exponential growth and the other is seeing something that looks more logarithmic instead.

I realize anecdotes aren't as useful as numbers for this kind of analysis, but there's such a wide gap between what people are observing in practice and what the tests and metrics are showing it's hard not to wonder about those numbers.

senordevnyc · 2026-01-02T01:35:04 1767317704

I’m genuinely curious what your “checking the frontier LLMs” looks like, especially if you haven’t used AI since last year.

jennyholzer3 · 2026-01-02T14:20:32 1767363632

"maybe a tiny bit better" is what you say when you've been tricked by snake oil salesman

This shit has gotten worse since 2023.

aspenmartin · 2026-01-02T20:56:47 1767387407

> This shit has gotten worse since 2023.

I would really appreciate it if people could be specific when they say stuff like this because it's so crazy out of line with all measurement efforts. There are an insane amount of serious problems with current LLM / agentic paradigms, but the idea that things have gotten worse since 2023? I mean come on.

senordevnyc · 2026-01-03T01:34:18 1767404058

You’re responding to a troll who just has a nasty, bitter axe to grind against AI. It’s honestly pretty sad and pathetic.

scotty79 · 2026-01-01T07:31:18 1767252678

> but we’re very clearly seeing sigmoid progress.

Yeah, probably. But no chart actually shows it yet. For now we are firmly in exponential zone of the signoid curve and can't really tell if it's going to end in a year, decade or a century.

utopiah · 2026-01-01T08:24:50 1767255890

Doesn't even matter if the goal is extremely high. Talking about exponential when we clearly see matching energy needs proves there is no way we can maintain that pace without radical (and thus unpredictable) improvements.

My own "feeling" is that it's definitely not exponential but again, doesn't matter if it's unsustainable.

fullstackchris · 2026-01-01T10:32:01 1767263521

I wrote an article complaining about the whole hype over a year ago:

https://chrisfrewin.medium.com/why-llms-will-never-be-agi-70...

Seems to be playing out that way.

aoeusnth1 · 2026-01-01T04:00:12 1767240012

We're very clearly seeing exponential progress - even above trend, on METR, whose slope keeps getting revised to a higher and higher estimate each time. Explain your perspective on the objective evidence against exponential progress?

llmslave2 · 2026-01-01T04:40:17 1767242417

Pretty neat how this exponential progress hasn't resulted in exponential productivity. Perhaps you could explain your perspective on that?

mgfist · 2026-01-01T05:31:27 1767245487

Because that requires adoption. Devs on hackernews are already the most up to date folks in the industry and even here adoption of LLMs is incredibly slow. And a lot of the adoption that does happen is still with older tech like ChatGPT or Cursor.

belmont_sup · 2026-01-01T08:28:59 1767256139

What’s the newer tech?

TeodorDyakov · 2026-01-01T10:50:34 1767264634

Claude Code With Opus 4.5

viraptor · 2026-01-01T04:55:46 1767243346

Writing the code itself was never the main bottleneck. Designing the bigger solution, figuring out tradeoffs, taking to affected teams, etc. takes as much time as it used to. But still, there's definitely a significant improvement in code production part in many areas.

HPMOR · 2026-01-01T05:11:46 1767244306

I think this is an open question still and very interesting. Ilya discussed this on the Dwarkesh podcast. But the capabilities of LLMs is clearly exponential and perhaps super exponential. We went from something that could string together incoherent text in 2022 to general models helping people like Terrance Tao and Scott Aaronson write new research papers. LLMs also beat IMO and the ICPC. We have entered the John Henry era for intellectual tasks...

tsimionescu · 2026-01-01T09:25:13 1767259513

> LLMs also beat IMO and the ICPC

Very spurious claims, given that there was no effort made to check whether the IMO or ICPC problems were in the training set or not, or to quantify how far problems in the training set were from the contest problems. IMO problems are supposed to be unique, but since it's not at the frontier of math research, there is no guarantee that the same problem, or something very similar, was not solved in some obscure manual.

llmslave2 · 2026-01-01T05:30:00 1767245400

> But the capabilities of LLMs is clearly exponential and perhaps super exponential

By what metric?

aspenmartin · 2026-01-02T21:08:20 1767388100

- Scaling laws (Chinchilla type)

- METR task horizon

It's a mix, performance gains are bursty but we have been getting a lot of bursts (RLVR, test-time compute, agentic breakthroughs)

jennyholzer3 · 2026-01-02T14:20:52 1767363652

Chat GPT told him it was true

utopiah · 2026-01-01T08:22:39 1767255759

BS metric... /s

aoeusnth1 · 2026-01-01T05:09:52 1767244192

It has! CLs/engineer increased by 10% this year.

LLMs from late 2024 were nearly worthless as coding agents, so given they have quadrupled in capability since then (exponential growth, btw), it's not surprising to see a modestly positive impact on SWE work.

Also, I'm noticing you're not explaining yourself :)

surajrmal · 2026-01-01T08:17:49 1767255469

I think this is happening by raising the floor for job roles which are largely boilerplate work. If you are on the more skilled side or work in more original/ niche areas, AI doesn't really help too much. I've only been able to use AI effectively for scaling refactors, not really much in feature development. It often just slows me down when I try to use it. I don't see this changing any time soon.

Madmallard · 2026-01-01T06:51:18 1767250278

LLMs a year ago were more able to do a complex project I've repeatedly tried to do than they are now.

scotty79 · 2026-01-01T07:36:43 1767253003

Try Antigravity with Gemini 3 Pro. Seems very capable to me.

llmslave2 · 2026-01-01T05:33:11 1767245591

Hey, I'm not the OG commentator, why do I have to explain myself! :)

When Fernando Alonso (best rookie btw) goes from 0-60 in 2.4 seconds in his Aston Martin, is it reasonable to assume he will near the speed of light in 20 seconds?

lopatin · 2026-01-01T07:17:53 1767251873

> Hey, I'm not the OG commentator, why do I have to explain myself! :)

The issue is that you're not acknowledging or replying to people's explanations for _why_ they see this as exponential growth. It's almost as if you skimmed through the meat of the comment and then just re-phrased your original idea.

> When Fernando Alonso (best rookie btw) goes from 0-60 in 2.4 seconds in his Aston Martin, is it reasonable to assume he will near the speed of light in 20 seconds?

This comparison doesn't make sense because we know the limits of cars but we don't yet know the limits of LLMs. It's an open question. Whether or not an F1 engine can make it the speed of light in 20 seconds is not an open question.

llmslave2 · 2026-01-01T08:26:58 1767256018

It's not in me to somehow disprove claims of exponential growth when there isn't even evidence provided of it.

My point with the F1 comparison is to say that a short period of rapid improvement doesn't imply exponential growth and it's about as weird to expect that as it is for an f1 car to reach the speed of light. It's possible you know, the regulations are changing for next season - if Leclerc sets a new lap record in Australia by .1 ms we can just assume exponential improvements and surely Ferrari will be lapping the rest of the field by the summer right?

aoeusnth1 · 2026-01-01T18:50:33 1767293433

There is already evidence provided of it! METR time horizons is going up on an exponential trend. This is literally the most famous AI benchmark and already mentioned in this thread.

https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...

https://metr.org/blog/2025-07-14-how-does-time-horizon-vary-...

aoeusnth1 · 2026-01-01T19:04:07 1767294247

If you're not going to explain yourself, at least stay on topic. We're talking about exponential growth, so address the points I'm making.

aoeusnth1 · 2026-01-01T20:46:49 1767300409

I'm noticing you're not responding to my claim that producivity has been impacted

scotty79 · 2026-01-01T07:34:17 1767252857

How long before introduction of computers lead to increases in average productivity? How long for the internet? Business is just slow to figure out how to use anything for its benefit, but it eventually gets there.

spectralista · 2026-01-01T10:56:26 1767264986

The best example is that even ATM machines didn't reduce bank teller jobs.

Why? Because even the bank teller is doing more than taking and depositing money.

IMO there is an ontological bias that pervades our modern society that confuses the map for the territory and has a highly distorted view of human existence through the lens of engineering.

We don't see anything in this time series, because this time series itself is meaningless nonsense that reflects exactly this special kind of ontological stupidity:

https://fred.stlouisfed.org/series/PRS85006092

As if the sum of human interaction in an economy is some kind of machine that we just need to engineer better parts for and then sum the outputs.

Any non-careerist, thinking person that studies economics would conclude we don't and will probably not have the tools to properly study this subject in our lifetimes. The high dimensional interaction of biology, entropy and time. We have nothing. The career economist is essentially forced to sing for their supper in a type of time series theater. Then there is the method acting of pretending to be surprised when some meaningless reductionist aspect of human interaction isn't reflected in the fake time series.

fmbb · 2026-01-01T08:44:03 1767257043

> How long before introduction of computers lead to increases in average productivity?

I think it never did. Still has not.

https://en.wikipedia.org/wiki/Productivity_paradox

barrenko · 2026-01-01T12:12:25 1767269545

Sir, we're in a modern economy, we don't ever ever look at productivity graphs (this is not to disparage LLMs, just a comment on productivity in general)