The Decoder March 27, 2025 NEUTRAL

How DeepSeek-V3 raises the stakes for open-source AI

DeepSeek-V3-0324 appears to compete with some of the strongest language models available, including OpenAI's GPT-4.5 and Claude 3.7 Sonnet on several benchmarks. The broader story is efficiency: Deepseek built a 671-billion-parameter model with unusually low reported training costs and open-source access under the MIT license.

WTF Index NEUTRAL

◄ Terminator 1 Idiocracy 0 ►

This is mainly a competitive open-source model launch, with only a mild power-diffusion angle and little direct evidence of harm or societal degradation.

How DeepSeek-V3 raises the stakes for open-source AI

Deepseek is putting new pressure on the AI market with DeepSeek-V3-0324, an open-source language model release that appears competitive with several leading systems. The model's benchmark performance, low reported training cost, and MIT license make it a notable challenge to the idea that only the largest, best-funded labs can build frontier-grade AI.

A stronger DeepSeek-V3 enters the race

Deepseek has released DeepSeek-V3-0324, a new version of its DeepSeek-V3 model. The update shows significant gains in mathematical reasoning and is reported to outperform OpenAI's GPT-4.5 and Anthropic's Claude 3.7 Sonnet on several key benchmarks, including MMLU-Pro, vGPQA, and AIME.

The model also improves in web development and Chinese language processing. In independent testing, it reached a 55 percent score on the Polyglot benchmark, placing it second among models without specialized reasoning abilities.

That matters because Deepseek is not only chasing benchmark leadership. It is doing so with a model made freely available for research and development under the MIT license. In a market where many capable models are closed, access is part of the story.

Why the earlier V3 model already stood out

The new release builds on attention Deepseek had already earned with its V3 model. According to independent testing firm Artificial Analysis, Deepseek's new V3 model could compete with some of the world's most advanced AI systems while carrying a total training cost of just $5.6 million.

In Artificial Analysis' comprehensive Quality Index, Deepseek-V3 scored 80 points. That placed it in the top tier alongside Gemini 1.5 Pro and Claude Sonnet 3.5. Google's Gemini and OpenAI's latest models still led overall, but Deepseek-V3 surpassed every other open-source model available today.

Its technical benchmark results were especially strong. Deepseek-V3 scored 92% on the HumanEval programming test and 85% on the MATH 500 challenge. Those results were connected to Deepseek's earlier R1 reasoning model from late November, which helped improve V3's problem-solving abilities.

Meta's chief AI researcher, Yann LeCun, also took notice, calling the model "excellent." The source also notes an important caveat: strong benchmark scores do not guarantee strong real-world performance. Still, the combination of test results, low cost, and technical transparency has made the AI community pay attention.

Efficiency is the core disruption

The most important part of the Deepseek story may be how the model was trained. According to AI expert Andrej Karpathy, a model of this level would typically require somewhere between 16,000 and 100,000 GPUs.

Deepseek used 2,048 GPUs running for 57 days, totaling 2.78 million GPU hours on Nvidia H800 chips to train a 671-billion-parameter model. By comparison, Meta used about 30.8 million GPU hours to train Llama 3, a 405 billion parameter model.

Karpathy called Deepseek's budget "a joke" for a model of this caliber. He also wrote, "You have to ensure that you're not wasteful with what you have, and this looks like a nice demonstration that there's still a lot to get through with both data and algorithms."

The efficiency gains do not mean large GPU clusters are obsolete. Karpathy still expects major compute resources to remain necessary for frontier language model development. But Deepseek shows that better use of data, algorithms, and available hardware can meaningfully change the economics.

Constraints shaped the engineering

Deepseek's position as a Chinese company is central to the technical story. Because of U.S. export restrictions, it had limited access to the latest Nvidia chips. The company worked with H800 GPUs, Nvidia AI chips with reduced capabilities designed for the Chinese market.

Those chips have much slower connection speeds between GPUs than the H100s used in Western labs. Deepseek responded by building custom solutions for processor communication instead of relying on off-the-shelf options.

The result is a useful lesson for the wider AI market: restricted hardware access can push teams to improve software, training strategy, and system design. The source also connects this to European AI development, where some advanced models do not reach the EU because companies like Meta and OpenAI either can't or won't adapt to the EU AI Act.

Open-source pressure meets price pressure

Deepseek is also competing on price. Artificial Analysis reported that Deepseek V3 costs a bit more than OpenAI's GPT-4o-mini or Google's Gemini 1.5 Flash, but remains cheaper than other models with similar capabilities. Cached requests receive a 90% discount, making it the most cost-effective option in its class.

The company did raise prices from its previous version. Input costs doubled to $0.27 per million tokens, while output costs increased fourfold to $1.10. Deepseek kept V3 at the old pricing until early February and made the model available to try for free on Deepseek's chat platform.

The larger implication is straightforward: open-source AI is becoming harder for the biggest labs to dismiss. DeepSeek-V3-0324 suggests that Deepseek may have a foundation for an R2 model, following the R1 reasoning model that made headlines as the first open-source model to compete with OpenAI's o1 and disrupted U.S. stock markets in the days following its debut.

At the same time, the source is clear that compute demand is not going away. The industry is shifting toward scaling inference time, meaning the amount of time a model is given to generate answers. If that approach grows, AI development may require significant compute, and likely more of it over time.