isn't that what the mixture of experts trick that all the big players do is? Bun... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		agoodusername63 28 days ago \| parent \| context \| favorite \| on: The RAM shortage comes for us all isn't that what the mixture of experts trick that all the big players do is? Bunch of smaller, tightly focused models

irthomasthomas 27 days ago [–]

Not exactly. MoE uses a router model to select a subset of layers per token. This makes them faster but still requires the same amount of RAM.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact