MIT Tech Review AI January 31, 2025 NEUTRAL

Why DeepSeek's efficiency story gets harder for energy

DeepSeek has been presented as a sign that AI could become less energy hungry, but the picture is more complicated. Early figures suggest its reasoning approach may shift energy demand from training to inference, especially when models produce longer answers.

DeepSeek has quickly become part of the global AI conversation, and one of the biggest claims around it is that a more efficient model could ease AI's growing energy problem. The source article argues that this conclusion is too simple.

The key issue is where the energy is used. DeepSeek may point to more efficient training methods, but early figures suggest that its reasoning process can make answering questions more energy intensive.

Training efficiency is only one part of AI energy use

Every AI model has two broad phases: training and inference. Training is the process in which a model learns from data, often over months. Inference happens later, every time a user asks the model to produce an answer.

Both phases usually run in data centers, where chips and cooling systems require substantial energy. That means a model can look efficient in one part of its life cycle while still creating heavier demand somewhere else.

DeepSeek's R1 model attracted attention partly because its team improved a technique called a “mixture of experts.” In that approach, only part of a model's billions of parameters is activated at a given time during training. The team also improved reinforcement learning, a process where a model's outputs are scored and used to make the model better.

The article notes that reinforcement learning is often done by human annotators, while the DeepSeek team became effective at automating it. That matters because it suggests a way to make training more efficient.

But lower cost does not automatically mean lower total energy use. Anthropic cofounder Dario Amodei wrote that “Because the value of having a more intelligent system is so high,” it “causes companies to spend more, not less, on training models.” He also wrote that “The gains in cost efficiency end up entirely devoted to training smarter models, limited only by the company’s financial resources.”

In plain language, if companies can get more model capability for the same money, they may choose to build stronger models rather than stop spending. The article frames this as an example of the Jevons paradox.

Why reasoning changes the equation

The more uncertain part of the DeepSeek energy story is inference. DeepSeek is designed as a reasoning model, which means it is intended to perform well on logic, pattern-finding, math, and other tasks that ordinary generative AI models can struggle with.

Reasoning models use a method called “chain of thought.” Instead of jumping straight to an answer, the model breaks a task into parts and works through them in order before reaching a conclusion.

That can help on certain benchmarks. The source article mentions MMLU, which tests knowledge and problem-solving across 57 subjects. But the same process can require more computation because the model is doing more work before it responds.

The article gives a simple example: when asked whether it is okay to lie to protect someone’s feelings, DeepSeek first considers the question through utilitarianism, then Kantian ethics, then other nuances before giving an answer. Its conclusion is that lying is “generally acceptable in situations where kindness and prevention of harm are paramount, yet nuanced with no universal solution.”

That kind of step-by-step reasoning can produce a richer answer. It can also produce a longer one, and longer answers can consume more energy during inference.

Early tests point to a murkier result

Scott Chamberlin, who spent years at Microsoft and later Intel building tools to reveal the environmental costs of digital activities, ran initial tests on DeepSeek's energy use. The article is careful about the limits of those figures.

The experiment tested only a medium-size version of DeepSeek's R-1, used only a small number of prompts, and is hard to compare with other reasoning models. Chamberlin also noted that DeepSeek is “really the first reasoning model that is fairly popular that any of us have access to.” OpenAI's o1 model is described as its closest competitor, but the company does not make it open for testing.

Instead, Chamberlin compared DeepSeek with a Meta model with the same number of parameters: 70 billion. The results show why claims about efficiency need caution.

The prompt about lying produced a 1,000-word response from DeepSeek.
That response took 17,800 joules to generate.
The article says that is about what it takes to stream a 10-minute YouTube video.
For that prompt, DeepSeek used about 41% more energy than Meta's model.
Across 40 prompts, DeepSeek had similar energy efficiency to the Meta model, but it tended to generate much longer responses and used 87% more energy.

The article also cites tests from a team at the University of Michigan in October. Those tests found that the 70-billion-parameter version of Meta's Llama 3.1 averaged just 512 joules per response.

These comparisons are not presented as a final scientific verdict. The models have different purposes, and the article states that a scientifically sound study comparing DeepSeek's energy use with competitors has not been done.

The bigger risk is using reasoning everywhere

The central concern is not just DeepSeek by itself. It is what happens if other companies copy the approach and add chain-of-thought reasoning to many products where it is not needed.

Sasha Luccioni, an AI researcher and climate lead at Hugging Face, warns that broad adoption could erase any efficiency benefit. “If we started adopting this paradigm widely, inference energy usage would skyrocket,” she says. “If all of the models that are released are more compute intensive and become chain-of-thought, then it completely voids any efficiency gains.”

The article compares this possible shift with an earlier one. Before ChatGPT launched in 2022, much AI work was extractive: finding information in text or categorizing images. After 2022, the focus moved toward generative AI, which makes better and better predictions and requires more energy.

Luccioni calls that “the first paradigm shift.” According to her research, that shift led to orders of magnitude more energy being used to accomplish similar tasks.

The source article suggests that chain-of-thought reasoning could become another major shift if companies add it broadly, the way generative AI has been added to Google search and messaging apps. OpenAI announced on January 31 that it would expand access to its own reasoning model, o3, which points in the same direction.

Efficiency gains still need proof at scale

DeepSeek may still matter because it shows that AI training methods can change. But the energy story depends on the full life cycle of a model, not just the cost of training it.

If a model uses less energy to train but more energy each time it answers, the result depends on how often it is used, how long its answers are, and whether reasoning is reserved for tasks that need it. The article does not claim those questions are settled.

Nathan Benaich, founder and general partner at Air Street Capital, frames the business question directly: “It will depend on whether or not the trade-off is economically worthwhile for the business in question.” He adds that “The energy costs would have to be off the charts for them to play a meaningful role in decision-making.”

For now, the main lesson is caution. DeepSeek's efficiency gains in training do not automatically translate into lower AI energy consumption overall. If reasoning models spread widely and inference demand rises with them, the energy outlook may become less promising, not more.