Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
AI Fails at 96% of (General Work) Jobs (New Study) (youtube.com)
21 points by swolpers 15 days ago | hide | past | favorite | 8 comments


Actual paper: https://www.remotelabor.ai/paper.pdf

Sounds about right.

With those test parameters for how long it would take a human to complete the same work, it fits a similar pattern to METR; i.e. at "humans would take 11.5 hours" (Figure 4, median) you're pushing your luck for any success with all but the most recent models*, and METR is testing software where AI has the possibility of fully automating a lot of its own tests.

Even more recent models than they tested, like Opus 4.5, are only 50% successful for tasks that take humans 5h20m: https://metr.org/time-horizons/

Assuming the bubble doesn't pop/WW3 doesn't start first (IDK, 25% and 5% respectively?), and if trends continue (???), I expect a similar paper this time next year to show something like 50% success at automation of similar tasks.

* which they didn't test, I don't blame them for that because this field moves too fast




translation: "96% of people trying to replace workers with AI don't know how to prompt it effectively or supervise its output."


The 4% is using it to write posts about ai on linkedin.


So what you're saying is the interface fails the common case?


Or they've determined that micromanaging it is circuitous and increases their dependence on tech giants, so it's a bad deal given that they also need to know the work well enough to verify it anyway.


96% are "holding it wrong".

There's a saying that if everywhere you go it smells like shit, you might just have some shit smeared on your own nose.

96% is not "holding it wrong".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: