Deepseek has put unusually specific numbers around the economics of running large language models. The picture that emerges is simple but important: serving AI can be far cheaper than the highest market prices suggest, but converting that advantage into actual profit is not automatic.
The company’s figures point to a theoretical profit margin of 545 percent if its services were fully monetized. That estimate sits beside a more complicated reality: Deepseek keeps many services free, charges less for some models, and currently earns revenue only from API access.
The numbers behind Deepseek’s AI costs
During a 24-hour test period, Deepseek’s models handled 608 billion input tokens and 168 billion output tokens. Those figures show the scale of the operation, but the more revealing detail is how much of that traffic did not need to be processed from scratch.
Deepseek served 56.3 percent of input tokens from a cache. In plain terms, caching lets the system reuse previous work when possible instead of spending the same compute resources again. For a language model service, that can change the economics sharply because every saved unit of processing helps reduce the cost of answering users.
The company also described a dynamic approach to resource use. During peak daytime hours, all nodes are assigned to inference requests. At night, when demand is lower, resources are shifted toward research and training tasks.
That matters because AI infrastructure is expensive even when usage changes through the day. A system that can move compute between customer-facing work and internal model work can make better use of the same hardware base.
How hardware turns into daily operating cost
Deepseek said the hardware infrastructure behind this operation costs $87,072 per day. The estimate is based on an average of 226.75 server nodes, with each node using eight Nvidia H800 GPUs. The calculation assumes a leasing cost of two dollars per GPU per hour.
The performance data also helps explain why the cost structure can support large volumes. A single H800 node processes about 73,700 input tokens per second during prefilling, or 14,800 output tokens during decoding. The average output speed reaches 20 to 22 tokens per second.
These details are important because language model pricing is usually discussed from the customer side: how much a developer pays for input and output tokens. Deepseek’s disclosure gives a view from the provider side, where the core question is how much hardware is needed to serve the tokens and how efficiently that hardware is scheduled.
On paper, the gap between cost and potential revenue looks large. If Deepseek billed every processed token at full premium R1 model rates, daily revenue would reach $562,027. Those premium R1 rates are $0.14 per million input tokens for cache hits, $0.55 for cache misses, and $2.19 per million output tokens.
Theoretical margin is not the same as cash
The central tension is that the $562,027 figure is theoretical. Deepseek says real-world revenue is much lower because its actual business does not charge full price across all usage.
Several factors reduce the practical revenue opportunity:
- The standard V3 model is priced below R1.
- Most services are offered for free.
- The company applies nightly discounts.
- For now, only API access is generating revenue.
This distinction is crucial. A model provider may have infrastructure capable of supporting high-margin usage, but the market may not allow the company to capture all of that value. Free access can expand reach, lower pricing can attract developers, and discounts can smooth usage patterns, but all of those choices reduce immediate revenue.
Deepseek’s data therefore does not simply say that AI is easy money. It says that efficient model serving can create the possibility of strong margins, while market behavior can make those margins hard to realize.
Why this pressures premium AI pricing
The broader implication is that language models may be moving toward commodity-like services. If multiple providers can offer useful models at lower prices, it becomes harder to justify premium pricing unless the performance advantage is clear enough to matter.
That is why OpenAI’s pricing strategy stands out in this context. The source article notes that GPT-4.5 commands premium prices far above both earlier OpenAI models and competitors such as Deepseek, while offering only modest performance improvements.
Deepseek’s figures sharpen the question for the market: when lower-priced models can be served efficiently, how much more will customers pay for a premium model? The answer depends not only on model quality, but also on whether buyers see enough practical advantage in the more expensive option.
This creates pressure for Western AI companies like OpenAI, which are losing billions and facing significant operating costs while market forces push prices lower. If the model layer becomes easier to compare and cheaper to access, the competitive battle may shift away from the model alone.
The value may move beyond the model
OpenAI GTM Manager Adam Goldberg recently emphasized that success in AI requires control across the full value chain, including infrastructure, data, models and applications. In the context of Deepseek’s numbers, that argument becomes easier to understand.
If language models become more standardized in the eyes of customers, the advantage may come from how well a company connects the whole stack. Infrastructure efficiency can lower cost. Data can shape model behavior. Applications can make the model useful in concrete workflows. Pricing can determine whether demand turns into revenue.
Deepseek’s disclosure is valuable because it separates two ideas that are often blurred together. The first is technical operating potential: with caching, dynamic resource allocation and efficient GPU usage, a provider can serve enormous token volumes at a cost far below theoretical revenue. The second is commercial capture: competition, free tiers, model discounts and lower-priced alternatives can keep actual revenue well below that theoretical ceiling.
For the AI market, that may be the more important lesson. The future of language model profits may depend less on charging the highest possible token price and more on building an operation that can run cheaply, allocate resources intelligently and turn the right parts of usage into paid demand.