Of course, we struggle to get humans to low error rates on large number of steps...

Nevermark · 2025-11-19T18:23:16 1763576596

> The issue is when people assumes that a zero failure rate, or even close to zero, is necessary for utility, even though we don't need that from humans for humans to be useful for complex tasks.

This argument doesn't carry because it is beside the point. Human vs. LLM utility parity isn't a sensible stop-goal for improvement. New technology isn't adopted for its legacy parity. Nor are there any specific technical barriers around human parity.

Fewer mistakes than humans, by definition, delivers unique value. People also want to spin up LLMs to handle tasks at scale in ways humans never could, where human level mistakes would be unacceptable.

So we very much do need LLMs (or whatever we call them tomorrow) to operate with lower error bars than humans. It is a reasonable demand. Lots of applications are waiting.

Given that demand, the value of avoiding any mistake, and the many people working on it, error rates will keep falling indefinitely.

vidarh · 2025-11-19T20:47:41 1763585261

> This argument doesn't carry because it is beside the point. Human vs. LLM utility parity isn't a sensible stop-goal for improvement. New technology isn't adopted for its legacy parity. Nor are there any specific technical barriers around human parity.

This is just utter nonsense. New technology is sometimes adopted because it is better, but just as often adopted even when the quality is strictly worse if it is cheaper.

But apart from that you appear to arguing against a point I never made, so it's not clear to me what the point of your response is.

> Fewer mistakes than humans, by definition, delivers unique value.

Yes, but that is entirely irrelevant to the argument I made.

> Given that demand, the value of avoiding any mistake, and the many people working on it, error rates will keep falling indefinitely.

And this is also entirely irrelevant to the point I made, and not something I've ever argued against.

Nevermark · 2025-11-19T21:06:58 1763586418

> when the quality is strictly worse if it is cheaper

True. I stand corrected.

abtinf · 2025-11-19T17:49:38 1763574578

For a comprehensive rebuttal to this point of view, you may be interested in the works of W. Edwards Deming.

“No one knows the cost of a defective product - don't tell me you do. You know the cost of replacing it, but not the cost of a dissatisfied customer.” -Deming

vidarh · 2025-11-19T20:36:38 1763584598

No, I would not, as this argument is entirely irrelevant and doesn't address what I said.

robot-wrangler · 2025-11-19T18:49:18 1763578158

> we struggle to get humans to low error rates on large number of steps in sequence too

Who said anything about AI vs humans? The contest in this context would be AI vs classical deterministic code, algorithms, solvers

> how costly it is to work around .. a function of the error rate, consequence of an error that slips past, the cost of a "reliable enough" detector.. produces a good enough results cheap enough.

I mean, you're right, but only sort of. Someone can use this same argument to justify the assertion that bogosort is really the pinnacle of engineering excellence. How would you respond?

vidarh · 2025-11-19T20:43:19 1763584999

> Who said anything about AI vs humans?

I did, because it is a relevant comparison.

> The contest in this context would be AI vs classical deterministic code, algorithms, solvers

No, it is not. In cases where we know how to solve things that way, we probably should, on the assumption that if they can deliver good enough results they are likely cheaper.

Those are not the things we generally are trying to use LLMs for.

> I mean, you're right, but only sort of. Someone can use this same argument to justify the assertion that bogosort is really the pinnacle of engineering excellence. How would you respond?

That it is an obivously specious argument, because we have clearly lower cost sort algorithms, and so no, you can't use this same argument to justify that assertion.