Not exactly. MoE uses a router model to select a subset of layers per token. Thi...

		irthomasthomas 28 days ago \| parent \| context \| favorite \| on: The RAM shortage comes for us all Not exactly. MoE uses a router model to select a subset of layers per token. This makes them faster but still requires the same amount of RAM.