Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is interesting to see the "DeepMind" branding completely vanish from the post. This feels like the final consolidation of the Google Brain merger. The technical report mentions a new "MoE-lite" architecture. Does anyone have details on the parameter count? If this is under 20B params active, the distillation techniques they are using are lightyears ahead of everyone else.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: