AI coding tools are now common in software engineering workflows, with products like Cursor and GitHub Copilot promising to write code, fix bugs, and test changes faster. A new METR study complicates that promise: in one trial with experienced developers, access to AI tools made the work slower, not faster.
What METR Tested
METR, a non-profit AI research group, published a new study on Thursday that examined whether today’s AI coding tools improved productivity for experienced developers. The study focused on practical engineering work rather than artificial coding puzzles.
The researchers recruited 16 experienced open source developers. Those developers completed 246 real tasks on large code repositories they regularly contribute to, which matters because the work happened inside familiar projects rather than unfamiliar toy environments.
METR used a randomized controlled trial. Roughly half of the tasks were assigned as AI-allowed, meaning developers could use state-of-the-art AI coding tools such as Cursor Pro. The remaining tasks forbade the use of AI tools.
Before starting, the developers expected AI assistance to help. They forecasted that using AI coding tools would reduce completion time by 24%.
The Result Went The Other Way
The measured outcome was the opposite of what the developers predicted. METR found that allowing AI increased completion time by 19%.
That finding is important because the participants were not beginners. They were experienced open source developers working in repositories they already knew. The result suggests that AI coding tools can add friction even when the person using them understands the codebase and the task.
Still, the details matter. Only 56% of the developers in the study had experience using Cursor, the main AI tool offered in the study. Nearly all of them, 94%, had experience using some web-based LLMs in their coding workflows. For some participants, however, the study was their first time using Cursor specifically.
The researchers also note that developers were trained on using Cursor before the study. That does not erase the possibility of a learning curve, but it does show METR did not simply hand participants an unfamiliar tool with no preparation.
Why The Tools May Have Slowed Work
METR researchers pointed to several possible reasons for the slowdown. One was the time developers spent prompting AI and waiting for responses. In those moments, the developer is interacting with the tool rather than directly changing the code.
That distinction is central to the productivity question. AI coding tools can generate code, suggest fixes, and help with tests, but using them still requires direction, review, and decisions from the developer. If the tool’s output needs repeated prompting or careful checking, the workflow can become slower.
The study also used large, complex code bases. METR researchers noted that AI tends to struggle in that setting. A complex repository can carry local conventions, dependencies, and context that are difficult for a coding tool to handle cleanly.
The result is a more nuanced view of so-called vibe coders. These tools may feel fast when they produce a first draft, but the full task includes understanding the repository, integrating the change, checking behavior, and making sure the result fits the project.
What The Study Does And Does Not Prove
The study does not show that AI coding tools always reduce productivity. The authors were careful about that point. They explicitly noted they do not believe AI systems currently fail to speed up many or most software developers.
Other large-scale studies have shown that AI coding tools do speed up software engineer workflows. METR’s own framing leaves room for different results across different developers, tools, tasks, and codebases.
The timing also matters. AI models from OpenAI, Google DeepMind, Anthropic, and xAI have rapidly improved their performance on software engineering tests in recent years. The study’s authors said they would not expect the same results even three months from now.
METR has also found that AI coding tools have significantly improved their ability to complete complex, long-horizon tasks in recent years. That makes the study less of a final verdict and more of a warning against assuming productivity gains are automatic.
The Practical Takeaway For Developers
The clearest lesson is that AI coding tools should be evaluated inside the actual workflow where they will be used. A tool that helps in one setting may slow work in another, especially when the codebase is large and complex.
Developers and teams should be careful with broad productivity claims. The METR study suggests that the real cost of using AI can include prompting, waiting, reviewing, and correcting. Those costs may be easy to miss if the focus is only on how quickly a tool can produce code.
The study also fits with broader concerns about current AI coding tools. Other studies have shown that today’s tools can introduce mistakes and, in some cases, security vulnerabilities. Speed is only one part of the engineering outcome; correctness and safety remain part of the work.
For experienced developers, the message is not to avoid AI coding tools. It is to treat them as tools whose value depends on the task, the repository, and the user’s fluency with the system. METR’s result challenges the simple idea that adding AI to a coding workflow automatically makes every developer faster.