The Decoder June 30, 2026 TERMINATOR

Claude Sonnet 5 pushes cheaper AI closer to Opus performance

Anthropic has released Claude Sonnet 5, a more agentic model that can plan, use tools, and work more independently than earlier Sonnet versions. Benchmarks show it beating Sonnet 4.6 across tested categories and, on one knowledge work benchmark, narrowly topping Opus 4.8.

WTF Index TERMINATOR

◄ Terminator 2 Idiocracy 1 ►

The story mainly describes a routine model launch, but its emphasis on more autonomous planning and tool use gives it a mild Terminator lean.

Claude Sonnet 5 pushes cheaper AI closer to Opus performance

Anthropic has introduced Claude Sonnet 5, positioning the model as a major step forward for its mid-tier Sonnet line. The company describes it as the most agentic Sonnet yet, meaning it can create plans, use tools such as browsers and terminals, and handle more work on its own.

The launch matters because Sonnet 5 appears to narrow the distance between the Sonnet family and the larger, more expensive Opus models. According to Anthropic's published results, the new model improves on Sonnet 4.6 in every tested category and comes close to Opus 4.8 in several areas.

Why Claude Sonnet 5 matters

Claude Sonnet 5 is designed to bring stronger autonomous work capabilities to a model tier that has traditionally sat below Opus. Anthropic says the model can build plans and choose tools like browsers and terminals, which makes it more useful for tasks that require multiple steps rather than a single answer.

That shift is important for AI users who rely on models for coding, research, search, and general knowledge work. A model that can plan and use tools effectively can spend more time executing a workflow, checking information, or operating inside a terminal-like environment.

Anthropic's framing is clear: Sonnet 5 is meant to close the gap with bigger, pricier models. The benchmark data in the source supports that direction, even though Opus 4.8 still leads in some categories.

The benchmark picture

Anthropic's published benchmarks show a broad improvement over Sonnet 4.6. On agentic coding, Sonnet 5 reaches 63.2 percent on SWE-bench Pro. Sonnet 4.6 scores 58.1 percent, while Opus 4.8 remains ahead at 69.2 percent.

The gains are especially visible on Terminal-Bench 2.1. Sonnet 5 scores 80.4 percent there, compared with 67.0 percent for Sonnet 4.6. That result fits Anthropic's claim that the model is better suited to tool-heavy tasks.

For multidisciplinary reasoning, measured by Humanity's Last Exam with tools, Sonnet 5 reaches 57.4 percent. That nearly matches Opus 4.8 at 57.9 percent. On computer use, measured by OSWorld-Verified, Sonnet 5 posts 81.2 percent, up from 78.5 percent for Sonnet 4.6.

The most striking comparison comes from GDPval-AA v2, which tests AI on real-world knowledge tasks. There, Sonnet 5 scores 1,618, just ahead of Opus 4.8 at 1,615. Anthropic says early-access partner feedback pointed in the same direction, with the model acting more agentically than earlier versions, including in search tasks.

Where safety fits into the launch

The Sonnet 5 release arrives in a broader context for Anthropic. The source notes that the US government is blocking the company's two most capable models, Mythos 5 and Fable 5, over cybersecurity concerns. That makes the security discussion around Sonnet 5 especially visible.

Anthropic says Sonnet 5 was not trained on cybersecurity tasks. In tests for risky capabilities such as writing software exploits, the model scores far below both Opus 4.8 and Mythos 5. The company views the overall cybersecurity risk from Sonnet 5 as low.

Even so, Sonnet 5 scores somewhat higher than its predecessor on those tasks. Anthropic has therefore enabled cyber safeguards by default. These protections flag and block risky cyber usage in real time and are described as being on par with the safeguards already used for Claude Opus 4.7 and 4.8.

The guardrails are not as strict as those used for Fable 5, which users complained about almost immediately. Anthropic also says Sonnet 5 does better than Sonnet 4.6 at refusing malicious requests and resisting prompt injection attacks. Hallucinations and sycophantic behavior, meaning the tendency to agree with the user, are also down, according to the company.

Availability, pricing, and developer details

Claude Sonnet 5 is available now across Anthropic platforms. It is the new default for Free and Pro users, while Max, Team, and Enterprise subscribers can also access it. Developers can use it through Claude Code and the Claude Platform.

For API users, the model name is claude-sonnet-5. The training cutoff is January 2026, and the model supports a one-million-token context window.

Anthropic is also offering introductory pricing. Until August 31, 2026, Claude Sonnet 5 costs $2 per million input tokens and $10 per million output tokens. After that, pricing rises to $3 per million input tokens and $15 per million output tokens, matching the cost of previous Sonnet models.

There is an important practical caveat. Because Sonnet 5 works more agentically, it may use more tokens per task. That means the real cost of completing work with the model could be higher than the headline per-token price suggests. The source notes that a similar pattern appeared when Opus moved from 4.6 to 4.7.

What this means for AI users

Claude Sonnet 5 does not replace the Opus line in every respect. Opus 4.8 still leads on some benchmark results, including SWE-bench Pro and Humanity's Last Exam with tools. But Sonnet 5's results show that stronger planning, tool use, and knowledge work performance are moving into a more accessible model tier.

For users choosing between Claude models, the key question is no longer just whether Sonnet is cheaper than Opus. It is whether Sonnet 5 can now handle enough complex work to reduce the need for a larger model in some workflows.

The answer will depend on the task. The benchmark results suggest clear progress in agentic coding, terminal-based tasks, computer use, and real-world knowledge work. The pricing details suggest teams will still need to watch token consumption closely, especially when the model uses tools and works through longer plans.