DeepSeek V4 is a 1.6 trillion parameter open-weight Mixture-of-Experts model. It reads 1M tokens at once. It uses 27% of V3.2’s inference FLOPs and 10% of its KV cache. The DeepSeek V4 tech report credits three moves: hybrid CSA plus HCA attention, Manifold-Constrained Hyper-Connections, and the Muon optimizer in place of AdamW.
Key Takeaways
- DeepSeek V4 is a free, open-weight AI that goes toe-to-toe with the top closed models from OpenAI, Anthropic, and Google.
- It reads 1 million tokens in one prompt, enough for several full books or a long agent run without losing track.
- It runs on roughly a quarter of the compute its previous version needed, making long-context AI affordable to operate.
- A smaller team built it without access to top NVIDIA chips, proving clever engineering can rival raw GPU spend.
- It scored a perfect 120 out of 120 on the 2025 Putnam math competition and beats Google’s Gemini 3.1 Pro at 1M-token recall.
DeepSeek V4 at a Glance
The official launch announcement on April 24, 2026 framed the release as “the era of cost-effective 1M context length.” It shipped two checkpoints under the MIT license. DeepSeek-V4-Pro runs at 1.6T total and 49B active parameters. DeepSeek-V4-Flash runs at 284B total and 13B active. Both models read 1M tokens at once. Both ship as open weights on Hugging Face . The routed expert weights use FP4 math, and most other weights use FP8.
Botmonster Tech
