Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Which is the goto leaderboard for determining which AI model is best for for answering devops / computer science questions / generating code? Wondering where Claude falls on this.

Recently canceled openai subscription because too much lag and crashes. Switched to Gemini because their webinterface is faster and rock solid. Makes me think the openai backend and frontend engineers don't know what they are doing compared to the google engineers.



chat.lmsys.org --> "Leaderboard" tab --> "Coding" drop-down selection

Or the scale.ai private benchmarks


One extensive benchmark I like is https://bigcode-bench.github.io/

It places Claude 3.5 Sonnet in third position.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: