Asking why may protect learning when using AI coding tools

An Anthropic study found that developers using AI assistance while learning the Trio library scored 17 percent worse on a later knowledge test than a control group. The strongest learning outcomes came when developers used AI for concepts, explanations, and follow-up questions rather than simple delegation.

WTF Index IDIOCRACY
◄ Terminator 0 Idiocracy 3 ►

The story warns that AI coding assistants can weaken developer learning when used for delegation instead of reasoning.

Asking why may protect learning when using AI coding tools

AI coding tools can help developers move through unfamiliar work, but a new Anthropic study suggests that speed is not the same as learning. When developers used AI assistance to learn a new programming library, they showed weaker understanding afterward unless they used the assistant in ways that forced them to explain, question, and reason.

What Anthropic Tested

The study focused on 52 mainly junior software developers. Each participant had worked regularly with Python for at least a year, had experience with AI assistants, and did not know the Trio library before the experiment.

Participants were split randomly into two groups. One group could use a GPT-4o-based AI assistant. The control group worked with documentation and web search only. Both groups were asked to complete two programming tasks with Trio as quickly as possible.

After the tasks, everyone took a knowledge test on the concepts they had used. The result was the central warning from the study: participants with AI access scored 17 percent worse than the control group. On average, the AI assistant also did not create a significant time saving.

Researchers Judy Hanwen Shen and Alex Tamkin described the workplace implication directly: "Our results suggest that incorporating AI aggressively into the workplace, particularly with respect to software engineering, comes with trade-offs,"

Delegating Code Is Different From Learning Code

The study did not find that every use of AI was equally harmful. The way developers interacted with the assistant mattered a lot. A qualitative analysis of screen recordings from 51 participants found six interaction patterns, and those patterns produced very different learning outcomes.

Three patterns were linked with weak test results, with quiz scores between 24 and 39 percent. The fastest group was made up of participants who handed programming work over to the AI, but their quiz score was only 39 percent. Others started independently and then leaned more and more on AI-generated answers. The weakest pattern involved repeatedly asking AI for debugging help without understanding the errors.

Those approaches share a common problem: they reduce the developer's need to build a mental model. The code may run, and the task may be finished, but the person may not have practiced the reasoning needed to recognize the same concept later.

Three other patterns kept learning much stronger, with quiz scores between 65 and 86 percent. The best approach was to let AI generate code and then ask specific follow-up questions. Asking for explanations while generating code also worked well. So did using AI only for conceptual questions.

In practical terms, the study points to a simple distinction. AI use that replaces thinking can weaken learning. AI use that prompts explanation, comparison, and conceptual checking can support it.

Why Productivity Gains Did Not Show Up

The results also complicate a common argument for AI programming assistants: that they make developers faster. In this study, AI users were not statistically faster overall.

The screen recordings help explain why. Some participants spent up to eleven minutes interacting with the assistant, including time spent shaping prompts. That extra conversational work could offset any time saved by generated code.

Only about 20 percent of participants in the AI group used the assistant exclusively for code generation. That group was faster than the control group, but it also had the worst learning results. The other AI users spent time asking questions, requesting explanations, or trying to understand what the assistant produced.

Anthropic's researchers suspect that AI may deliver stronger productivity benefits on repetitive or familiar tasks. That matters because earlier studies finding significant productivity gains measured work where participants already had the needed skills. This study examined a different situation: learning something new.

Mistakes May Be Part Of The Training

The control group made more mistakes and had to deal with them three times more often than the AI group. That sounds inefficient, but the study suggests those errors may have been useful.

When participants without AI became stuck, they had to inspect the code, reason through the problem, and connect the error to Trio's concepts. The researchers describe this kind of struggle as potentially important for competence. The study specifically notes that the largest differences on the knowledge test appeared in debugging questions.

The control group encountered more Trio-specific errors during the tasks, including runtime warnings for unexpected coroutines. Those experiences appear to have helped them understand core ideas that the AI group could bypass.

This does not mean developers should avoid assistance. It means that friction can carry information. If an AI assistant removes the difficult moment before the learner has understood it, it may also remove part of the lesson.

What This Means For Software Teams

The study's warning is especially important for safety-critical applications. If humans are expected to review and debug AI-generated code, they still need the skills to do that work. Those skills may weaken if AI tools are adopted in ways that prevent developers from forming them in the first place.

The researchers summarize the risk clearly: "Our findings suggest that AI-enhanced productivity is not a shortcut to competence and AI assistance should be carefully adopted into workflows to preserve skill formation,"

For teams, the study suggests that AI coding tools should not be treated as a universal shortcut for onboarding, training, or new-library adoption. The safer pattern is to keep cognitive effort in the loop. Developers should ask why code works, request explanations, and use AI to clarify concepts rather than simply accept finished answers.

The researchers also note limits to the study. It looked at a one-hour task and a chat-based interface. Agent-based AI systems like Claude Code, which require even less human involvement, could make the negative effect on skill development worse. Whether similar effects happen in other knowledge work, such as writing or design, has not been studied.

Anthropic is also working on automating non-coding jobs through tools like Claude Cowork. That makes the study notable: the company earns money from AI assistants such as Claude, yet its own research department is warning that these products can carry real trade-offs for skill formation.