Why Claude Opus 4.6 now leads the AI benchmark race

Claude Opus 4.6 is currently the top-ranked AI model on the Artificial Analysis Intelligence Index. The picture may change once OpenAI's Codex 5.3 finishes benchmarking, especially in coding.

WTF Index NEUTRAL
◄ Terminator 1 Idiocracy 0 ►

This is mainly a routine benchmark update, with only a mild lean toward more powerful agentic AI capabilities.

Why Claude Opus 4.6 now leads the AI benchmark race

Claude Opus 4.6 has moved into first place on the Artificial Analysis Intelligence Index, giving Anthropic's newest model a clear benchmark win across a broad mix of AI tasks. The lead is meaningful, but it is not yet the final word: OpenAI's Codex 5.3 is still being benchmarked and is expected to be especially strong in coding.

What The Ranking Measures

The Artificial Analysis Intelligence Index is not built around a single test. It is a composite of ten tests that cover coding, agent tasks, and scientific reasoning. That matters because the result is meant to reflect performance across different kinds of work rather than one narrow use case.

Claude Opus 4.6 leads the index overall. The model also took first place in agent-based work tasks, terminal coding, and physics research problems. Those areas point to a model that is being judged not only on text generation, but also on practical task execution and reasoning-heavy work.

For readers comparing AI models, the key point is that the benchmark rewards breadth. A model can rank highly by doing well across several categories, and Claude Opus 4.6 is currently doing enough across the tested areas to hold the top position.

Why Codex 5.3 Still Matters

The ranking comes with an important caveat. Artificial Analysis has not finished benchmarking OpenAI's Codex 5.3. The source notes that Codex 5.3 will likely pull ahead in coding, which means the competitive picture could shift once those results are complete.

That does not erase Claude Opus 4.6's current position. It does mean the lead should be read as current rather than permanent. In fast-moving AI benchmarking, a model can be ahead overall while another model may be better suited to a specific category once full testing is done.

For teams that care most about coding, Codex 5.3 remains the model to watch. For teams looking at a broader mix of agent work, terminal coding, and scientific reasoning, Claude Opus 4.6 now has a benchmark result that puts it at the front of the field.

The Cost Behind The Score

The benchmark result also comes with a price tag. Running the complete test suite for Claude Opus 4.6 costs $2,486. That is higher than the $2,304 required for GPT-5.2 at maximum reasoning performance.

The cost comparison is notable because Claude Opus 4.6 used fewer output tokens than GPT-5.2. Opus 4.6 consumed roughly 58 million output tokens, while GPT-5.2 used 130 million. Opus 4.6 also used twice as many output tokens as Opus 4.5.

The source attributes the higher total price to Anthropic's token pricing: $5 per million input tokens and $25 per million output tokens. In other words, benchmark cost is not just about how many tokens a model produces. It also depends heavily on how those tokens are priced.

  • Complete Claude Opus 4.6 test suite cost: $2,486.
  • GPT-5.2 at maximum reasoning performance cost: $2,304.
  • Claude Opus 4.6 output tokens: roughly 58 million.
  • GPT-5.2 output tokens: 130 million.
  • Anthropic pricing: $5 per million input tokens and $25 per million output tokens.

Where Claude Opus 4.6 Is Available

Claude Opus 4.6 is already available through several channels. The model can be accessed through the Claude.ai apps and through Anthropic's API. It is also available through Google Vertex, AWS Bedrock, and Microsoft Azure.

That distribution matters because benchmark results only become useful when users can actually try the model in the environments they already use. The listed access points cover direct app usage, API access, and major cloud platforms.

The practical takeaway is simple: Claude Opus 4.6 is the current top-ranked model on the Artificial Analysis Intelligence Index, with strong results in agent tasks, terminal coding, and physics research problems. But the race is still open, because Codex 5.3 has not completed the same benchmarking process and may change the coding leaderboard when it does.