Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> haha good one, so why haven't they done this yet? What are they waiting for? Let's see these super advanced "experts" with "specialized models"!!

I understand it's very easy to post ignorant messages in internet forums, but the answer to your question is yes, "they have done it" and it does result in cheaper training costs. See models such as DeepSeek-MoE or Mixtral.

https://github.com/deepseek-ai/DeepSeek-MoE

https://mistral.ai/news/mixtral-of-experts



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: