I was referring to this paper a lot when it was hyped, when people cared about architectural decisions of neural networks. It was also the year I started studying neural networks.
I think the idea still holds. Although the interest has been shifted towards test-time scaling and thinking, researcher still care about architectures like nemotron 3, recently published.
Can anyone give more updates on this direction of research, more recent papers?
I think the idea still holds. Although the interest has been shifted towards test-time scaling and thinking, researcher still care about architectures like nemotron 3, recently published.
Can anyone give more updates on this direction of research, more recent papers?