AI agents are moving from answering questions toward acting on behalf of users. A recent study suggests that shift could create a new problem: when two AI systems bargain with each other, the weaker one may consistently get the worse deal.
The research tested AI-to-AI price negotiation and found that more capable models often secured better financial outcomes. That matters because autonomous agents are being positioned as tools that could shop, compare offers, and eventually negotiate for people or companies.
What the study tested
The study, posted to arXiv’s preprint site, examined what happens when AI models act as buyers and sellers. The researchers used three negotiation settings: electronics, motor vehicles, and real estate.
Seller agents were given product specifications, wholesale cost, and retail price. Their instruction was to maximize profit. Buyer agents received a budget, the retail price, and ideal product requirements. Their goal was to push the price down.
Neither side had full information. That design matters because many negotiations work the same way. Buyers and sellers often know their own limits, needs, and incentives, but they do not fully understand the other party’s position.
The result was not a level playing field. More advanced models tended to bargain more effectively. Less advanced ones were more likely to leave money on the table, either by paying too much as buyers or earning too little as sellers.
Stronger models got better deals
OpenAI’s ChatGPT-o3 had the strongest overall negotiation results in the experiment. It was followed by GPT-4.1 and o4-mini. GPT-3.5, described in the source as the oldest model included in the study, performed much worse in both roles: it made the least money as a seller and spent the most as a buyer.
DeepSeek R1 and V3 also performed well, especially as sellers. Qwen2.5 trailed behind overall, though it showed more strength when acting as a buyer.
The researchers also saw different trade-offs across models. Some agents failed to close deals often, but when they did complete a negotiation, they protected profit well. Others completed more transactions but accepted lower margins.
GPT-4.1 and DeepSeek R1 showed the strongest balance between profit and completion rate. That distinction is important because a useful AI negotiator cannot simply demand the best possible price. It also has to finish the negotiation when a workable agreement is available.
Why this could widen a digital divide
The larger concern is not only that some AI agents bargain better than others. It is that access to stronger AI could shape real financial outcomes.
Jiaxin Pei, a postdoc researcher at Stanford University and one of the study’s authors, described the risk directly: "Over time, this could create a digital divide where your financial outcomes are shaped less by your negotiating skill and more by the strength of your AI proxy,"
That warning follows logically from the study’s setup. If AI agents negotiate prices and one side has a more capable system, the user behind that system may gain an advantage. A person or business with weaker AI could face worse prices, weaker margins, or failed deals.
The source identifies several possible reasons for the performance gap. These include training data, reasoning ability, the capacity to infer missing information, and model size. The exact causes remain uncertain, but the study found a clear pattern: larger models within the same model family consistently made better deals as both buyers and sellers.
This is where price negotiation becomes more than a convenience feature. If bargaining moves into AI-to-AI interactions, differences in model capability could quietly affect who benefits from commerce and who loses ground.
Failure modes matter in high-stakes use
The study also found that AI agents did not only make weaker bargains. They sometimes failed in ways that would be risky in real transactions.
Some agents became stuck in long negotiation loops without reaching agreement. Others stopped too early, even when they had been told to pursue the best possible deal. The source notes that even the most capable models were vulnerable to these problems.
Pei said the result was unexpected: "The result was very surprising to us," He added, "We all believe LLMs are pretty good these days, but they can be untrustworthy in high-stakes scenarios."
This connects to a wider research concern about AI agents in financial decision-making. Earlier this month, a group of researchers from multiple universities argued that LLM agents should be judged mainly by their risk profiles, not only by their best performance. The issue is not just how well an agent can do when everything works. It is also how safely it fails when conditions are difficult.
That group warned that in real-world finance, even a tiny weakness, including a 1% failure rate, could create systemic risks. They recommended that AI agents be stress tested before practical deployment.
Why real-world use is still limited
The price negotiation study also has limits. Hancheng Cao, an incoming assistant professor at Emory University, noted that the experiments used simulated environments and may not fully represent real-world negotiations or user behavior.
Even so, the findings help explain why companies may be cautious. Many AI shopping tools today focus on product recommendation rather than bargaining. In April, Amazon launched "Buy for Me," an AI agent that helps customers find and buy products from other brands’ sites when Amazon does not sell them directly.
Price negotiation is still rare in consumer e-commerce, but it is more common in business-to-business transactions. Alibaba.com has rolled out Accio, a sourcing assistant built on its open-source Qwen models, to help businesses find suppliers and research products. The company told MIT Technology Review it has no plans to automate price bargaining so far because of high risk.
Researchers and industry practitioners are testing ways to reduce the risks. The source names several approaches: better prompts, external tools or code, multiple models checking each other’s work, and fine-tuning on domain-specific financial data. These methods have shown promise in improving performance.
For now, Pei’s advice is cautious. AI shopping assistants may be useful for information gathering, but not yet as full decision-makers. As he put it: "I don’t think we are fully ready to delegate our decisions to AI shopping agents," and "So maybe just use it as an information tool, not a negotiator."