Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
qsort
6 months ago
|
parent
|
context
|
favorite
| on:
AI agent benchmarks are broken
Not even that, see LMArena. They vaguely gesture in the general direction of the model being good, but between contamination and issues with scoring they're little more than a vibe check.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: