Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Two reasons: They were only recently implemented in large production models and are not part of ye-standard-ML-coursera. And I mean, for half the papers claiming that their particular efficiency variant reduces O(n^2) to whatever without performance loss, we found that in practice it ain't quite so shiny.

Anyone who has been for whatever reason reading the papers since 2017 has invariably read dozens of these papers.

Anyone who has heard of GPT-x in 202x and started from there probably didn't.

This will likely change with implementation of memory retrieval, some form of linear attention etc. in many productions models, and the democratization of some decoder models... although I have been thinking this for a while.

Don't get me wrong, you want to hire the people who know these papers, especially if they started after 2017 :-)



I have a team of bioinformaticians with very little ML knowledge, but they know of these basic papers... So yeah, if someone claims to be in ML and aren't aware of this foundational knowledge... they're not to be taken seriously.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: