Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One thing that stands out playing with the sorting is that Google's Gemini claims to have a context window more than 10x that of most of its competition. Has anyone experimented with this to see if its useful context window is actually anything close to that?

In my own experiments with the chat models they seem to lose the plot after about 10 replies unless constantly "refreshed", which is a tiny fraction of the supposed 128000 token input length that 4o has. Does Gemini actually do something dramatically differently, or is their 3 million token context window pure marketing nonsense?



https://github.com/NVIDIA/RULER results in benchmark other than needle in haystack seem solid all the way to 128k


Thanks, this is exactly the kind of info I was hoping existed.


When the released it they specifically focused on the accurate recall across the context window. There are a bunch of demos of things like giving it a whole movie as input (frame every N seconds plus script or something) and asking for highly specific facts).

Anecdotally, I use NotebookLM a bit, and while that’s probably RAG plus large contexts (to be clear, this is a guess not based on inside knowledge), it seems very accurate.


What tactics do you use to refresh while using them?


I tend to use a sentence along these lines: "Give me a straightforward summary of what we discussed so far, someone who didn't read the above should understand the details. Don't be too verbose."

Then i just continue from there or simply use this as a seed in another fresh chat.


I don't have a strategy that I like—it just amounts to having to say "you forgot about requirement X, try again keeping that in mind".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: