Ars Technica AI January 21, 2025 TERMINATOR

Why DeepSeek R1 makes open reasoning AI harder to ignore

DeepSeek has released the R1 model family under an MIT license, including a 671 billion parameter flagship model and smaller distilled versions. The company says R1 performs near OpenAI’s o1 on several reasoning benchmarks, though those claims have not yet been independently verified.

WTF Index TERMINATOR

◄ Terminator 2 Idiocracy 0 ►

Openly releasing advanced reasoning capability makes powerful AI more accessible, but the article does not describe concrete harm or loss of control.

Why DeepSeek R1 makes open reasoning AI harder to ignore

DeepSeek R1 has quickly become one of the most closely watched AI releases because it puts advanced reasoning-style capability into a model family that can be downloaded, studied, modified, and used commercially under an MIT license.

The release matters because simulated reasoning models have mostly been associated with proprietary systems such as OpenAI’s o1. DeepSeek now claims that its R1 family can compete with that level of performance on several math and coding tests, while also offering smaller versions that can run on less demanding hardware.

What DeepSeek Released

On Monday, Chinese AI lab DeepSeek released the R1 model family. The largest version contains 671 billion parameters, while the broader release includes DeepSeek-R1-Zero and DeepSeek-R1.

DeepSeek also published six smaller DeepSeek-R1-Distill models. These range from 1.5 billion to 70 billion parameters and are based on existing open source architectures including Qwen and Llama. The smaller models were trained using data generated from the full R1 model.

That range is important. The smallest version can run on a laptop, while the full model needs far more substantial computing resources. In practice, the release gives researchers, developers, and technically inclined users multiple ways to explore the same family of capabilities.

License: MIT license
Largest model: 671 billion parameters
Distilled models: 1.5 billion to 70 billion parameters
Related architectures: Qwen and Llama

Why Reasoning Models Are Different

R1 belongs to a class of models often described as simulated reasoning models, or SR models. These systems use an inference-time reasoning approach, meaning they take extra time while generating an answer in order to work through a problem more deliberately.

That behavior is different from the conventional large language model experience, where the model often produces a response more directly. In SR models, the extra time can improve performance on tasks involving math, physics, and science.

OpenAI’s o1 model family brought this approach into wider attention when it debuted in September 2024. OpenAI later teased a major upgrade called o3 in December. DeepSeek R1 is drawing attention because it appears to bring similar behavior into an open-weights release.

Independent AI researcher Simon Willison tested one of the smaller models and described the experience in vivid terms. He told Ars, “They are SO much fun to run, watching them think is hilarious”.

Willison also noted that responses begin with a pseudo-XML chain-of-thought section using a <think>...</think> tag. Even simple prompts can lead the model to generate extensive internal reasoning before giving its final output.

The Benchmark Claims

DeepSeek says R1 performs at levels comparable to OpenAI’s o1 on several math and coding benchmarks. The company reports that R1 outperformed o1 on AIME, MATH-500, and SWE-bench Verified.

Those benchmarks cover different kinds of problem solving. AIME is a mathematical reasoning test. MATH-500 is a collection of word problems. SWE-bench Verified is a programming assessment tool.

These claims are central to why the release is being discussed so widely. Open-weights models have often lagged behind proprietary systems such as OpenAI’s o1 on reasoning benchmarks. If DeepSeek’s claims hold up under outside scrutiny, the gap between public models and closed commercial systems may be narrowing.

At the same time, the source makes an important caution clear: AI benchmarks should be treated carefully, and the R1 results have not yet been independently verified. That means the release is significant, but the strongest performance claims still need outside confirmation.

Open Access Changes The Stakes

The MIT license is a major part of the story. It means the model can be studied, modified, and used commercially. For developers and researchers, that is different from simply accessing a model through a hosted product.

Local use also changes who can experiment with this kind of technology. The smallest distilled model can run on a laptop, making at least some part of the R1 family available without the infrastructure required for the full 671 billion parameter version.

TechCrunch reports that three Chinese labs, DeepSeek, Alibaba, and Moonshot AI’s Kimi, have now released models they say match o1’s capabilities. DeepSeek first previewed R1 in November. The broader point is that o1-like capability is no longer being claimed by only one lab or one product family.

Dean Ball, an AI researcher at George Mason University, wrote on X that the performance of DeepSeek’s distilled models means capable reasoners will continue to spread widely and run on local hardware. His point reflects the practical impact of distillation: smaller models can carry important parts of the larger model’s behavior into more accessible environments.

The Cloud Version Has A Limitation

There is also a notable constraint. The cloud-hosted version of R1 will not generate responses about certain topics, including Tiananmen Square or Taiwan’s autonomy.

The source attributes this to the model’s Chinese origin and the need to “embody core socialist values,” according to Chinese Internet regulations. The filtering comes from an added moderation layer.

That limitation does not apply in the same way when the model is run locally outside of China. This distinction is important because the same model family can behave differently depending on whether it is accessed through a cloud service or run on local hardware.

DeepSeek R1 is therefore not just another model release. It is a test of how far open reasoning AI has advanced, how much capability can be compressed into smaller models, and how quickly public systems may approach the performance of leading proprietary tools.