Why AI agents are advancing fastest in software development

Anthropic’s analysis of millions of real human-agent interactions shows that AI agents are becoming more independent. But adoption is heavily concentrated in software development, while other fields remain at an early stage.

WTF Index TERMINATOR
◄ Terminator 2 Idiocracy 1 ►

The story mildly leans Terminator because it emphasizes AI agents becoming more autonomous and capable, though mainly in routine software development adoption.

Why AI agents are advancing fastest in software development

AI agents are moving from short, closely supervised interactions toward longer stretches of independent work. Anthropic’s analysis of millions of real interactions from Claude Code and the public API suggests a clear pattern: agentic systems are becoming more capable in practice, but their use is still concentrated in one domain.

Software development is where the activity is happening at scale. Other areas, including business intelligence, customer service, sales, finance, and e-commerce, are still far behind.

Software development dominates agent use

According to the study, software engineering accounts for nearly 50 percent of all agent tool calls through the public API. That makes it the largest visible center of agent adoption in Anthropic’s data.

The gap is striking because the other industries named in the analysis are not small or marginal fields. Business intelligence, customer service, sales, finance, and e-commerce are all obvious candidates for automation and assistance. Yet none of them claims more than a few percentage points of traffic.

Anthropic describes the current moment as the "early days of agent adoption." The phrase matters because it frames the software development lead as a starting point, not necessarily the final shape of the market.

Developers were among the first to build agent-based tools and use them in real workflows. That gives software engineering a head start: the work is already digital, tool-heavy, and structured around tasks that can often be broken down into steps. The source data does not say that other industries cannot adopt agents. It shows that, so far, they have not adopted them at the same scale.

Claude Code is running longer without intervention

One of Anthropic’s central findings concerns how long Claude Code works before a person steps in. The median work step is around 45 seconds and has remained relatively stable.

At the far end of the distribution, however, the change is much larger. The 99.9th percentile nearly doubled between October 2025 and January 2026, rising from under 25 minutes to over 45 minutes.

That pattern suggests that the longest autonomous sessions are stretching out even while typical sessions remain short. In practical terms, most work steps are still brief, but some users are allowing the agent to continue for much longer periods.

Anthropic notes that the increase is steady across different model releases. If the change came only from improved model capability, the data might show sharper jumps around releases. Instead, the trend points to several forces working together: users gaining experience, users trusting the system with more ambitious tasks, and the product improving over time.

The gap between capability and deployment

Anthropic uses the term "deployment overhang:" to describe a gap between what models could handle and what they are actually asked to do in the real world. The idea is that technical capability may be ahead of day-to-day usage.

The source connects this view to a similar argument from OpenAI and Microsoft CEO Nadella, who have said that AI models can already do more than humans request from them. Anthropic also points to an evaluation by METR estimating that Claude Opus 4.5 can solve tasks with a 50 percent success rate that would take a human nearly five hours.

The implication is not that agents should be turned loose without limits. It is that users and organizations may still be learning how much autonomy to grant, where oversight should sit, and which kinds of tasks are suitable for longer agent runs.

That learning curve appears directly in the Claude Code usage data. New users fully auto-approve about 20 percent of sessions. After roughly 750 sessions, that figure climbs past 40 percent.

More experienced users also interrupt slightly more often. The interruption rate rises from about 5 percent of work steps for new users to around 9 percent for experienced ones. Anthropic reads this as a change in operating style: newer users approve more steps one by one, while experienced users allow more autonomy and step in when needed.

Even then, intervention remains limited. Experienced users do not interrupt in more than 90 percent of work steps.

Oversight changes as tasks become more complex

The public API shows a related pattern. For simple work, such as editing a line of code, 87 percent of tool calls involve some form of human oversight.

For more complex tasks, including autonomously finding zero-day exploits or writing a compiler, that oversight figure falls to 67 percent. The source does not say that complex tasks are less risky. Instead, the data shows that when users assign harder work, they often allow the agent more room to operate.

Claude Code also appears to apply its own brake during demanding work. For the most difficult tasks, it pauses to ask questions more than twice as often as it does for minimal-complexity work.

Anthropic presents that behavior as a safety mechanism. A model that can identify uncertainty and ask for confirmation adds another layer alongside external controls such as authorization systems and human approvals.

What broader adoption could require

Anthropic expects agents at high levels of risk and autonomy to become more common, especially if adoption spreads beyond software engineering into higher-stakes industries. That would make the design of oversight systems more important, not less.

The company recommends broader post-deployment monitoring for model developers, product builders, and policymakers. At the same time, it warns against requiring a single rigid interaction pattern for every agent system.

One example is manual approval for every agent action. Anthropic argues that this can create friction without necessarily improving safety.

The larger lesson is that AI agent adoption is not only a model-capability story. It is also about user trust, product design, task selection, and the practical rules that determine when an agent can act, when it should ask, and when a human needs to intervene.

For now, software development is the clearest test case. It shows agents becoming more autonomous in real use, while also showing how much of the broader economy has yet to make the same shift.