Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I mean, its open source so people can create benchmark and independently verify if the AI was wrong and then have the claims be passed to the author.

Thank you for volunteering. I look forward to your results.



> Thank you for volunteering. I look forward to your results.

Sure can you wait a few weeks tho? I know nothing about benchmarking so gonna learn it first and I have a few tests to prepare for irl.

I do feel like someone else more passionate about the project should try to pick the benchmarking though.

I don't mind benchmarking it but I only know tools like hyper for benchmarks & I have played with my fair share of zip archives and their random access retrieval but I feel like even that would depend from source to source.

There are some experienced people in here who are really cool at what they do, I just wanted to say that if someone's interested and already has the Domain Specific knowledge to benchmark & they enjoy it in the first place, this having AI benchmark shouldn't be much of a problem in comparison.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: