Ars Technica AI February 24, 2025 TERMINATOR

How Claude 3.7 Sonnet pushes reasoning into coding work

Anthropic announced Claude 3.7 Sonnet with an optional “extended thinking” mode for step-by-step problem solving. The release also brings Claude Code, a limited research preview of a command line coding agent aimed at software development workflows.

WTF Index TERMINATOR

◄ Terminator 2 Idiocracy 1 ►

A more capable reasoning model and command line coding agent mildly increase AI autonomy in software work, though this is mostly a routine product launch.

How Claude 3.7 Sonnet pushes reasoning into coding work

Anthropic has introduced Claude 3.7 Sonnet, a new AI language model built around a choice that matters for everyday use: fast replies when speed is enough, or “extended thinking” when a problem needs more careful work.

The company is positioning the model as its first “hybrid reasoning model,” and the launch puts coding near the center of the story. Alongside the model, Anthropic also revealed Claude Code, a command line AI agent for developers that is currently available as a limited research preview.

A model that can slow down on purpose

The defining feature of Claude 3.7 Sonnet is its simulated reasoning capability, called “extended thinking.” When enabled, the system can work through a problem step by step before producing an answer.

Anthropic describes the model as giving users a practical switch between quick output and more deliberate visible chain-of-thought processing. The source compares that category with OpenAI’s o1 and o3 series models, Google’s Gemini 2.0 Flash Thinking, and DeepSeek’s R1.

For API users, the control is more specific. Developers can set how many tokens Claude 3.7 Sonnet may spend on thinking, up to its 128,000 token output limit. That means the reasoning budget becomes something developers can tune based on the task, rather than a fixed behavior hidden behind the model.

The pricing structure did not change with this launch. API pricing remains $3 per million input tokens and $15 per million output tokens. Thinking tokens are counted in output pricing because they are part of the context the model uses.

Availability, plans, and refusals

Claude 3.7 Sonnet is available across all Claude subscription plans. The extended thinking mode is available on every plan except the free tier.

Anthropic also said it reduced unnecessary refusals in 3.7 Sonnet by 45 percent. In practical terms, the model should be less likely to reject harmless requests because it has interpreted boundaries too broadly.

That change is notable because Claude 3.5 Sonnet had a reputation in the AI world for being cautious. The update suggests Anthropic is trying to keep safety behavior while reducing cases where the model declines a request that should be answerable.

The release also reflects a naming adjustment. Claude 3.5 Sonnet launched in June 2024 and received an update in October with a nearly identical name. Some users found that confusing and informally called the October version “Claude 3.6 Sonnet.” On the Claude 3.7 release page, Anthropic wrote, “Lesson learned on naming.”

Why developers are watching Claude 3.7 Sonnet

Benchmark claims in the source point to coding as the model’s strongest area. Anthropic says Claude 3.7 Sonnet achieved top scores on SWE-bench Verified, which evaluates how AI models handle real-world software issues, and on TAU-bench, which tests AI agents on complex tasks involving users and tools.

The source also notes that Claude 3.5 Sonnet had already been strong at programming tasks compared with other AI models in the author’s experience. Claude 3.7 Sonnet appears to continue that emphasis, with Anthropic highlighting early testing around software work.

For developers, the model release is not only about writing code. Anthropic has expanded its GitHub integration to all Claude plans, letting developers connect code repositories directly to Claude. The stated use cases include bug fixes, feature development, and documentation work.

That matters because coding assistants become more useful when they can see the surrounding project context. A model that can inspect a repository is better positioned to respond to real tasks than one that only sees a copied snippet.

Usage limits remain a potential question for heavy users. The source says Anthropic has not announced a subscription plan beyond the existing “Claude Pro” at $20/month that might extend them.

Claude Code brings the agent into the terminal

Anthropic’s other major announcement is Claude Code, described as the company’s first agentic tool. It runs from a console terminal and is designed as an autonomous coding assistant.

According to the source, Claude Code can search codebases, read and edit files, write and run tests, commit and push code to GitHub repositories, and execute command line tools. It is also meant to keep developers informed while it works.

Anthropic is aiming the tool at debugging and refactoring as well as general coding assistance. The company claims that, in internal testing, Claude Code completed tasks in a single session that would typically require 45-plus minutes of manual work.

That claim is important because it frames Claude Code less as a chat interface and more as a workflow tool. Instead of only suggesting changes, the agent can operate in the same environment where developers already inspect files, run commands, and manage repositories.

Still, Claude Code is not being presented as a finished general release. It is available only as a “limited research preview,” and Anthropic says it plans to improve the tool based on user feedback over time.

Early tests show the promise and the limits

The source includes a brief hands-on look at extended thinking. One test asked Claude 3.7 Sonnet about the origin of the “magenta” color name. With extended thinking enabled, the model gave a firm “no” followed by an explanation, which the source described as impressive in comparison with prior experience.

Another informal test asked Claude 3.7 Sonnet to compose five original dad jokes that are not found anywhere in the world. The model attempted the task, though the source left the quality of the jokes for readers to judge.

Those examples are modest, but they show the broader point of simulated reasoning models. The value is not only in getting longer answers. It is in letting the model spend more of its output budget on working through uncertainty before it replies.

Claude 3.7 Sonnet is now available through the Claude website, the Claude app, Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. With extended thinking, unchanged API pricing, expanded GitHub integration, and Claude Code entering preview, Anthropic is making a clear push toward AI systems that can support deeper software work rather than simple one-shot answers.