Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This does look like a large relative increase in score, but it seems like it comes from getting zero correct out of 6 to getting 1 and 1/2 correct. I think it's fair to say the sample size here is relatively small. Still, a record is a record! Congrats to the team for a new record!


From my small sample size (tens of queries per day), Gemini 2.5 seems like a noticeable improvement in (almost) every way compared to to previous Gemini models.

Answers do seem to take longer to generate, but well worth the cost.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: