> haha good one, so why haven't they done this yet? What are they waiting for? Let's see these super advanced "experts" with "specialized models"!!
I understand it's very easy to post ignorant messages in internet forums, but the answer to your question is yes, "they have done it" and it does result in cheaper training costs. See models such as DeepSeek-MoE or Mixtral.
I understand it's very easy to post ignorant messages in internet forums, but the answer to your question is yes, "they have done it" and it does result in cheaper training costs. See models such as DeepSeek-MoE or Mixtral.
https://github.com/deepseek-ai/DeepSeek-MoE
https://mistral.ai/news/mixtral-of-experts