Deepseek has updated its R1 model with a release that makes the open-weight system more competitive with leading AI models from major Western tech companies. The new version, Deepseek-R1-0528, does not change the original architecture, but Deepseek says improved algorithms and additional computing power have lifted performance across a wide range of tasks.
The headline change is reasoning. Deepseek describes the update as bringing "significantly improved depth of reasoning," and the benchmark results in the source show gains in math, programming, general knowledge, and logic.
A reasoning upgrade without a new architecture
Deepseek-R1-0528 is notable because the model keeps the same core architecture while delivering higher scores. That matters because the improvement appears to come from training and optimization rather than a structural redesign.
On AIME 2025, accuracy rose from 70 to 87.5 percent. The model also appears to spend more of its output budget working through each prompt: average token use per prompt increased from 12,000 to 23,000.
That larger token footprint points to deeper analysis, at least in the way the model handles benchmark questions. Deepseek also says the update reduced hallucinations and expanded support for JSON output and function calling, two areas that matter for developers building systems around AI models.
Deepseek says all tests used standardized parameters and a maximum context length of 64,000 tokens.
Math and coding benchmarks move higher
The strongest reported gains appear in math and programming evaluations. On AIME 2024, accuracy improved from 79.8 to 91.4 percent. HMMT 2025 rose from 41.7 to 79.4 percent, while CNMO 2024 moved from 78.8 to 86.9 percent.
Programming results followed the same direction. LiveCodeBench increased from 63.5 to 73.3 percent. Aider-Polyglot climbed from 53.3 to 71.6 percent. SWE Verified moved from 49.2 to 57.6 percent.
The model's Codeforces rating also rose from 1530 to 1930 points. Taken together, the reported results suggest that the update is not limited to one narrow category of test. It improves performance across problem-solving tasks where reasoning, precision, and multi-step output are central.
General knowledge gains are broader, but not universal
Deepseek's reported general knowledge and logic results also improved. GPQA-Diamond rose from 71.5 to 81.0 percent. Humanity's Last Exam doubled from 8.5 to 17.7 percent. MMLU-Pro increased from 84.0 to 85.0 percent, and MMLU-Redux moved from 92.9 to 93.4 percent.
There was one reported decline. OpenAI's SimpleQA dropped from 30.1 to 27.8 percent. That exception is important because it shows the update did not raise every metric at once, even while the overall trend was strongly positive.
For developers and researchers, the pattern is still significant. The model appears better suited to longer, more detailed answers, and it adds practical improvements around structured output and function calling. Those are not just benchmark concerns; they affect how easily a model can be integrated into applications that need consistent formats or tool use.
Independent scoring puts Deepseek closer to frontier rivals
The independent platform Artificial Analysis also reported a major improvement. It gave Deepseek-R1-0528 a score of 68 on its Intelligence Index, up from 60 for the January version.
Artificial Analysis compared that jump to the move from OpenAI's o1 (62) to o3 (70). The updated Deepseek model is described as being in the same league as Google's Gemini 2.5 Pro.
Artificial Analysis currently ranks Deepseek-R1-0528 ahead of xAI's Grok 3 mini (high), Meta's Llama 4 Maverick, Nvidia's Nemotron Ultra, and Alibaba's Qwen3 253. In coding, the model is reported to be just shy of OpenAI o4-mini (high) and o3.
The platform points to increased post-training with reinforcement learning as the main reason for the gains. It also reports that token usage in evaluation rose by 40 percent, from 71 to 99 million tokens, meaning the model now produces longer and more detailed answers during testing.
Open weights, smaller models, and licensing
The update also reinforces Deepseek-R1's position in the open-weight model field. The source describes open models such as Deepseek-R1 as closing in on proprietary US models, with Deepseek-R1 still leading the open-weight field.
Alongside the main release, Deepseek is also releasing Deepseek-R1-0528-Qwen3-8B. This distilled model is built on Alibaba's Qwen3 8B and retrained using chain-of-thoughts from R1-0528.
Deepseek says the compact model scores 86 percent on AIME 2024. That is ten points higher than the original Qwen3 8B and on par with the much larger Qwen3-235B-thinking. It is also designed to run efficiently on an Nvidia H100.
Deepseek frames the smaller model as evidence that reasoning-focused compact models can achieve competitive results while using far fewer resources. The company writes, "We believe that the chain-of-thought from DeepSeek-R1-0528 will hold significant importance for both academic research on reasoning models and industrial development focused on small-scale models."
Licensing is another part of the release. Deepseek-R1-0528 is released under the MIT License, described in the source as one of the most permissive open-source licenses available. It allows anyone to use, modify, and distribute the model, including for commercial projects, with almost no limitations.
Deepseek's Qwen-based models, including Deepseek-R1-0528-Qwen3-8B, are released under the Qianwen License. That license requires preservation of copyright and license notices, grants express patent rights, and allows modified or larger works to be redistributed under different terms, even without sharing source code.