This does look like a large relative increase in score, but it seems like it com... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		adverbly 11 months ago \| parent \| context \| favorite \| on: Gemini 2.5 gets 24.4% on MathArena USAMO beating p... This does look like a large relative increase in score, but it seems like it comes from getting zero correct out of 6 to getting 1 and 1/2 correct. I think it's fair to say the sample size here is relatively small. Still, a record is a record! Congrats to the team for a new record!

onlyrealcuzzo 11 months ago [–]

From my small sample size (tens of queries per day), Gemini 2.5 seems like a noticeable improvement in (almost) every way compared to to previous Gemini models.

Answers do seem to take longer to generate, but well worth the cost.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact