Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There won't ever be newer training data.

The OG data came from sites like Stackoverflow. These sites will stop existing once LLMs become better and easier to use. Game over.



Every time claude code runs tests or builds after a change, it's collecting training data.


You need human language programming-related questions to train on too, not just the code.


thats what the related chats are for?


And now you're training LLMs on LLM output.

No, you need something like Stackoverflow. The crowdsourced ratings system that Stackoverflow has (had?) is the crucial part.


[dead]


I can't pretend to know how things work internally, but I would expect it to be involved in model updates.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: