Why AI energy use is still so hard to measure

OpenAI CEO Sam Altman said the average ChatGPT query uses 0.34 watt-hours of energy, but researchers say that figure lacks the context needed to judge it. New research points to a broader problem: most large language model use happens on systems with no environmental disclosure.

WTF Index TERMINATOR
◄ Terminator 1 Idiocracy 0 ►

The story mildly leans Terminator because it highlights opaque AI infrastructure and potential environmental harm at large scale, but not autonomy or direct danger.

Why AI energy use is still so hard to measure

AI is becoming a normal part of search, writing, coding, and everyday problem-solving. But the environmental cost of that use remains difficult to pin down because the companies best positioned to provide clear numbers often do not share enough information.

That gap matters. Researchers are trying to measure AI energy use and AI carbon emissions, yet even widely repeated claims can rest on thin evidence when model providers do not disclose how their systems are run.

A single ChatGPT number is not enough

Sam Altman, the CEO of OpenAI, recently wrote that the average ChatGPT query uses 0.34 watt-hours of energy. He compared that to what an oven would use in a little over one second, or what a high-efficiency lightbulb would use in a couple of minutes.

On its own, the number sounds precise. The problem is that experts say it is not very useful without more explanation from OpenAI. A meaningful figure would need to show how the calculation was made and what was included.

Several basic questions remain open:

  • What counts as an “average” query?
  • Does the figure include image generation?
  • Does it include training AI models?
  • Does it include cooling OpenAI’s servers?

OpenAI has 800 million weekly active users, and growing, so small-seeming per-query numbers can become important at scale. But without the assumptions behind the estimate, researchers cannot tell how representative the number is.

Sasha Luccioni, the climate lead at AI company Hugging Face, was blunt about the lack of supporting detail. “He could have pulled that out of his ass,” she says. OpenAI did not respond to a request for more information about how it arrived at this number.

Most LLM use has no environmental disclosure

The larger issue is not one company’s single estimate. It is the lack of environmental transparency across major AI models.

An analysis submitted for peer review this week by Luccioni and three other authors examines that problem. Using data from OpenRouter, a leaderboard of large language model traffic, the researchers found that 84 percent of LLM use in May 2025 was for models with zero environmental disclosure.

That means users are overwhelmingly choosing AI systems without knowing their environmental impact. They may know whether a model is fast, impressive, or convenient, but not whether it is efficient, what emissions it produces, or how its infrastructure affects the final carbon cost.

Luccioni argues that this stands in sharp contrast to other consumer choices. “It blows my mind that you can buy a car and know how many miles per gallon it consumes, yet we use all these AI tools every day and we have absolutely no efficiency metrics, emissions factors, nothing,” she says. “It’s not mandated, it’s not regulatory. Given where we are with the climate crisis, it should be top of the agenda for regulators everywhere.”

Weak estimates can become accepted facts

When official numbers are missing, rough estimates can fill the space. That can create a second problem: claims that were never strongly supported may be repeated until they sound authoritative.

One example is the claim that the average ChatGPT request uses 10 times as much energy as the average Google search. Luccioni and her colleagues traced that claim to a public remark made in 2023 by John Hennessy, the chairman of Alphabet, the parent company of Google.

The source matters. The statement came from a board member of one company discussing another company’s product. Luccioni’s analysis found that the figure has still appeared repeatedly in press and policy reports.

“People have taken an off-the-cuff remark and turned it into an actual statistic that’s informing policy and the way people look at these things,” Luccioni says. “The real core issue is that we have no numbers. So even the back-of-the-napkin calculations that people can find, they tend to take them as the gold standard, but that’s not the case.”

Open-source models give researchers a clearer view

One way researchers can get closer to real measurements is by studying open-source models. Proprietary systems from companies including OpenAI and Anthropic are harder for outside researchers to independently verify, because key details are not available.

By contrast, models with publicly available components can be evaluated more directly. A study published Thursday in the journal Frontiers of Communication examined 14 open-source large language models, including two Meta Llama models and three DeepSeek models.

The study used 1,000 benchmark prompts covering topics such as high school history and philosophy. Half were multiple choice questions with one-word answers available. The other half were open prompts that allowed longer responses.

The results showed meaningful differences. Some models used as much as 50 percent more energy than other models in the dataset when responding to the researchers’ prompts. Reasoning models generated far more thinking tokens, which are internal reasoning measures produced while the model forms an answer. Those thinking tokens are associated with greater energy use.

The more complex models were also more accurate with complex topics. But they sometimes struggled with brevity. During the multiple choice phase, for example, more complex models often returned answers with multiple tokens even when told to answer only from the available options.

Efficiency depends on the task and the infrastructure

Maximilian Dauner, a PhD student at the Munich University of Applied Sciences and the study’s lead author, says AI use could become more efficient if simpler requests were routed to less-energy-intensive models that can still answer accurately.

“Even smaller models can achieve really good results on simpler tasks, and don't have that huge amount of CO2 emitted during the process,” he says.

Some companies already use that kind of approach. Google and Microsoft have previously told WIRED that their search features use smaller models when possible, which can also produce faster responses for users.

Still, model providers generally do little to guide users toward lower-energy choices. Noman Bashir, the Computing & Climate Impact Fellow at MIT’s Climate and Sustainability Consortium, says the speed of a model’s answer has a major effect on energy use, but users are not usually shown that tradeoff when choosing AI tools.

“The goal is to provide all of this inference the quickest way possible so that you don’t leave their platform,” he says. “If ChatGPT suddenly starts giving you a response after five minutes, you will go to some other tool that is giving you an immediate response.”

Hardware and data center conditions also affect emissions. Dauner ran his experiments on an Nvidia A100 GPU, while Nvidia’s H100 GPU, which the company says is becoming increasingly popular, is much more energy-intensive.

The physical data center matters too. Cooling systems, lighting, and networking equipment add energy demand. Data centers can follow diurnal cycles as query volume changes, and the emissions tied to a query also depend on whether the local grid is powered mainly by fossil fuels or renewables.

The result is a measurement problem with real consequences. AI energy use cannot be understood from a single number without knowing the model, the task, the hardware, the data center, and the grid behind it. Until providers disclose more, users and policymakers are left trying to judge the climate impact of AI with only partial information.