DeepSeek's Sparse Attention paper was published in February: https://arxiv.org/abs/2502.11089
DeepSeek 3.2 Exp (combining MLA and DSA) was released in September.
You also had several other Chinese hybrid models, like Qwen3 Next and Minimax M1.
DeepSeek's Sparse Attention paper was published in February: https://arxiv.org/abs/2502.11089
DeepSeek 3.2 Exp (combining MLA and DSA) was released in September.
You also had several other Chinese hybrid models, like Qwen3 Next and Minimax M1.