Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Of course, we struggle to get humans to low error rates on large number of steps in sequence too, to the point where we devote vast amount of resources to teaching discipline, using checklists, doing audits and reviews to coax reliability out of an unreliable process.

So nobody should be surprised that this also applies to LLMs.

The issue is when people assumes that a zero failure rate, or even close to zero, is necessary for utility, even though we don't need that from humans for humans to be useful for complex tasks.

For a whole lot of tasks, the acceptable error rate boils down to how costly it is to work around, and that is a function of the error rate, consequence of an error that slips past, and the cost of a "reliable enough" detector to let us mitigate to whatever extent is cost effective by using one or more detection steps.

For a lot of uses, voting or putting the AI in a loop, produces a good enough results cheap enough. For some it will require models with lower error rates first.

For some applications, sure, maybe solvers will be part of that, or in the mix. As will a lot of other tools. E.g. Claude likes to try to bisect when I ask it to fix a parser problem, and Claude is really bad at doing sensible bisection, so I had it write a dumb little bisection tool instead, and told it steps to solve this type of problem that includes using that tool. So when we can have planning steps output "microsteps" that we can automate with more deterministic tools, then we absolutely should.

Heck, the models themselves "likes" to write tools to automate if you give them long lists of tedious little tasks to do, to the point it's effort to make them not do it even when they have to write the tools themselves.



> The issue is when people assumes that a zero failure rate, or even close to zero, is necessary for utility, even though we don't need that from humans for humans to be useful for complex tasks.

This argument doesn't carry because it is beside the point. Human vs. LLM utility parity isn't a sensible stop-goal for improvement. New technology isn't adopted for its legacy parity. Nor are there any specific technical barriers around human parity.

Fewer mistakes than humans, by definition, delivers unique value. People also want to spin up LLMs to handle tasks at scale in ways humans never could, where human level mistakes would be unacceptable.

So we very much do need LLMs (or whatever we call them tomorrow) to operate with lower error bars than humans. It is a reasonable demand. Lots of applications are waiting.

Given that demand, the value of avoiding any mistake, and the many people working on it, error rates will keep falling indefinitely.


> This argument doesn't carry because it is beside the point. Human vs. LLM utility parity isn't a sensible stop-goal for improvement. New technology isn't adopted for its legacy parity. Nor are there any specific technical barriers around human parity.

This is just utter nonsense. New technology is sometimes adopted because it is better, but just as often adopted even when the quality is strictly worse if it is cheaper.

But apart from that you appear to arguing against a point I never made, so it's not clear to me what the point of your response is.

> Fewer mistakes than humans, by definition, delivers unique value.

Yes, but that is entirely irrelevant to the argument I made.

> Given that demand, the value of avoiding any mistake, and the many people working on it, error rates will keep falling indefinitely.

And this is also entirely irrelevant to the point I made, and not something I've ever argued against.


> when the quality is strictly worse if it is cheaper

True. I stand corrected.


For a comprehensive rebuttal to this point of view, you may be interested in the works of W. Edwards Deming.

“No one knows the cost of a defective product - don't tell me you do. You know the cost of replacing it, but not the cost of a dissatisfied customer.” -Deming


No, I would not, as this argument is entirely irrelevant and doesn't address what I said.


> we struggle to get humans to low error rates on large number of steps in sequence too

Who said anything about AI vs humans? The contest in this context would be AI vs classical deterministic code, algorithms, solvers

> how costly it is to work around .. a function of the error rate, consequence of an error that slips past, the cost of a "reliable enough" detector.. produces a good enough results cheap enough.

I mean, you're right, but only sort of. Someone can use this same argument to justify the assertion that bogosort is really the pinnacle of engineering excellence. How would you respond?


> Who said anything about AI vs humans?

I did, because it is a relevant comparison.

> The contest in this context would be AI vs classical deterministic code, algorithms, solvers

No, it is not. In cases where we know how to solve things that way, we probably should, on the assumption that if they can deliver good enough results they are likely cheaper.

Those are not the things we generally are trying to use LLMs for.

> I mean, you're right, but only sort of. Someone can use this same argument to justify the assertion that bogosort is really the pinnacle of engineering excellence. How would you respond?

That it is an obivously specious argument, because we have clearly lower cost sort algorithms, and so no, you can't use this same argument to justify that assertion.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: