Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was referring to this paper a lot when it was hyped, when people cared about architectural decisions of neural networks. It was also the year I started studying neural networks.

I think the idea still holds. Although the interest has been shifted towards test-time scaling and thinking, researcher still care about architectures like nemotron 3, recently published.

Can anyone give more updates on this direction of research, more recent papers?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: