The Decoder June 14, 2025 TERMINATOR

Why Claude Research uses parallel AI agents for complex search

Anthropic has described how Claude Research uses a lead agent and several specialized sub-agents to handle complex searches in parallel. Internal tests showed a 90.2 percent gain over a standalone Claude Opus 4 agent, though the approach uses about 15 times more tokens than standard chats.

WTF Index TERMINATOR

◄ Terminator 1 Idiocracy 0 ►

Parallel agent delegation modestly increases AI autonomy and capability, but the story is mostly a technical performance update.

Why Claude Research uses parallel AI agents for complex search

Anthropic has shared technical details about Claude Research, a system designed to handle complex information searches by splitting work across multiple AI agents. Instead of asking one model to do everything in sequence, the system assigns a coordinating role to one agent and sends specialized sub-agents to gather information in parallel.

The result, according to Anthropic's internal testing, is a research workflow that can move faster and cover more ground than a single-agent setup. The tradeoff is clear: the system consumes far more tokens, and its future direction introduces harder problems in coordination, state management, and error handling.

How the Claude Research agent is organized

The architecture starts with a lead agent. Its job is to read the user's prompt, understand what the request needs, and create a search strategy. After that, it launches several specialized sub-agents, each focused on finding information that contributes to the final answer.

This matters because complex search tasks often contain multiple parts. A single agent can still work through them, but it has to move step by step. Anthropic's design allows parts of the research process to happen at the same time, which is why the company describes the approach as better suited to queries that need large amounts of information.

In the version described by Anthropic, Claude Opus 4 acts as the main coordinator. Claude Sonnet 4 is used for the sub-agents. That split gives the system a central planner while still allowing multiple search efforts to run in parallel.

The core idea is not simply to add more agents for the sake of scale. The lead agent has to decide what work should be divided, where each sub-agent should look, and how the pieces should come back together. The system is therefore built around delegation as much as raw model capability.

Why parallel search changes performance

Anthropic says its internal tests found that the multi-agent system outperformed a standalone Claude Opus 4 agent by 90.2 percent. The source describes that as a gain from running several specialized agents in parallel, rather than relying on one agent to perform the whole search alone.

The performance improvement is tied to several factors:

Parallel processing: sub-agents can search for different parts of the answer at the same time.
Tool use: the number of tools used contributed to additional improvement in internal testing.
Model selection: the choice of model also affected results.
Token budget: multi-agent runs used much more context and generation capacity than standard chats.

Anthropic's testing found that token use explained about 80 percent of performance differences. That makes token consumption one of the most important variables in the system. More tokens allow the agents to process and produce more information, but they also make the workflow heavier.

The source gives one comparison that shows tokens are not the entire story. Upgrading to Claude Sonnet 4 produced a larger performance boost than simply doubling the token budget in Claude Sonnet 3.7. In other words, spending more tokens helped, but the model and the surrounding tool setup still mattered.

The cost side of multi-agent work

The biggest constraint in the described system is token consumption. Anthropic says multi-agent runs use about 15 times more tokens than standard chats. That is a substantial increase, and it changes how the system should be understood.

For simple prompts, a multi-agent system may be unnecessary. Anthropic sees the current architecture as best suited for searches that require large amounts of information and can benefit from parallel processing. That distinction is important: the design is not presented as a universal replacement for standard chat, but as a way to improve difficult research-style tasks.

The evaluation process is also part of the system. Anthropic uses an LLM as a judge to score outputs for factual accuracy, source quality, and tool use. The company says this method is more reliable and efficient than traditional evaluation techniques.

That approach places one AI system in the role of assessing another AI system's work. It also reflects a broader pattern in the architecture: models are not only generating answers, but also planning, supervising, judging, and improving the behavior of other model-driven components.

Claude 4 as its own prompt engineer

Anthropic also claims that, in specific scenarios, Claude 4 can identify its own mistakes and revise tool descriptions to improve performance over time. The source describes this as a case where the system effectively acts as its own prompt engineer.

This is a notable detail because it points beyond one-off search execution. If an agent can notice where its tools or instructions are underperforming and adjust descriptions accordingly, then part of the improvement loop can happen inside the AI workflow itself.

The source does not claim that this solves every reliability problem. It says this behavior appears in specific scenarios. That limitation matters, especially for systems that depend on accurate search, careful tool use, and coherent coordination among multiple agents.

What comes next for agentic AI

Anthropic's next stated direction is asynchronous execution. In the current setup described by the source, there is still a need to wait for all sub-agents to finish before moving forward. Anthropic wants agents to be able to create new sub-agents and keep working in parallel without that constraint.

If achieved, asynchronous execution could make the system more flexible and faster. Agents would not have to operate in a fixed wave where every sub-agent completes before the next step begins. Instead, new work could be launched while other work is still underway.

But the source also makes clear that this direction introduces unresolved challenges. Anthropic points to coordination, state management, and error handling as problems that have yet to be fully solved. Those are not minor details. A system with agents creating more agents has to keep track of what each component is doing, what information is current, and how mistakes are detected and contained.

Claude Research therefore shows both the promise and the limits of multi-agent AI. Parallel agents can improve complex search, and Anthropic's internal results show a large performance gain. But the same design also raises the cost of execution and makes orchestration more difficult. For agentic AI, the next step is not just more agents. It is better control over how they work together.