Ars Technica AI December 10, 2025 TERMINATOR

Open-weights AI coding gets a serious push from Devstral 2

Mistral AI has released Devstral 2, a 123 billion parameter open-weights coding model built for autonomous software engineering agents. The launch also includes Mistral Vibe, a terminal-based development tool that can inspect projects, edit multiple files, and run shell commands.

WTF Index TERMINATOR

◄ Terminator 2 Idiocracy 1 ►

The story mildly leans Terminator because it highlights more capable autonomous coding agents that can edit files and run shell commands, though it is mostly a product launch.

Open-weights AI coding gets a serious push from Devstral 2

Mistral AI is pushing further into AI-assisted software development with Devstral 2, an open-weights coding model designed to work inside an autonomous software engineering agent. The release matters not only because of the model’s benchmark result, but because Mistral paired it with a developer-facing command line tool built for real project work.

What Mistral Released

Devstral 2 is a 123 billion parameter open-weights coding model from French AI startup Mistral AI. It is built for software engineering workflows where an AI system needs to understand a project, make changes, and help move code toward a working patch.

The model scored 72.2 percent on SWE-bench Verified, a benchmark that tests whether AI systems can solve real GitHub issues. That result places Devstral 2 among the stronger open-weights coding models described in the source article.

Mistral also introduced Mistral Vibe, a command line interface that lets developers use Devstral models directly from a terminal. The tool is similar in broad purpose to Claude Code, OpenAI Codex, and Gemini CLI.

Mistral Vibe can scan file structures and Git status so it can preserve context across a project. It can also make changes across multiple files and execute shell commands autonomously. Mistral released the CLI under the Apache 2.0 license.

Why SWE-bench Verified Matters

Benchmarks should be treated carefully, especially in AI. A score does not prove that a model will perform well in every repository, team process, or production setting.

Still, SWE-bench Verified is watched closely by major AI companies, according to the source article. It presents AI models with 500 real software engineering problems taken from GitHub issues in popular Python repositories.

The task is more involved than simply generating a code snippet. The AI must read the issue description, move through the codebase, and produce a patch that passes unit tests.

There is also an important limitation. Some AI researchers have noted that around 90 percent of the benchmark’s tasks are relatively simple bug fixes that experienced engineers could complete in under an hour. Even with that caveat, the benchmark remains one of the few standardized ways to compare coding models.

The Smaller Model And Local Use

Mistral released Devstral Small 2 alongside Devstral 2. Devstral Small 2 is a 24 billion parameter model that scores 68 percent on SWE-bench Verified.

The smaller model is notable because it can run locally on consumer hardware such as a laptop, with no Internet connection required. That makes it relevant for developers who want local AI coding help rather than relying only on hosted APIs.

Both Devstral 2 and Devstral Small 2 support a 256,000 token context window. That allows them to process moderately large codebases, though whether a codebase feels large or small depends heavily on the complexity of the project.

The two models also use different licenses. Mistral released Devstral 2 under a modified MIT license, while Devstral Small 2 uses the more permissive Apache 2.0 license.

Pricing And Cost Claims

Devstral 2 is currently free to use through Mistral’s API. After that free period ends, Mistral says pricing will be $0.40 per million input tokens and $2.00 per million output tokens.

Devstral Small 2 is priced lower, at $0.10 per million input tokens and $0.30 per million output tokens.

Mistral says Devstral is about “7x more cost-efficient than Claude Sonnet at real-world tasks.” The source article compares that with Anthropic’s Sonnet 4.5 through the API, which costs $3 per million input tokens and $15 per million output tokens, with increases depending on total token use.

For developers and engineering teams, the practical question is not only whether the model is cheaper per token. It is whether the model can complete useful work accurately enough, with enough context, and with enough reliability to justify putting it into a real workflow.

The Vibe Coding Question

The name Mistral Vibe points directly at “vibe coding,” a term AI researcher Andrej Karpathy coined in February 2025. The term describes a style of programming where developers describe what they want in natural language and accept AI-generated code without closely reviewing it.

Karpathy described the approach as being able to “fully giv[e] in to the vibes, embrace exponentials, and forget that the code even exists.” Collins Dictionary named it Word of the Year for 2025.

The idea has attracted both interest and caution. In an interview with Ars Technica in March, developer Simon Willison said, “I really enjoy vibe coding. It’s a fun way to try out an idea and prove if it can work.” He also warned that “vibe coding your way to a production codebase is clearly risky. Most of the work we do as software engineers involves evolving existing systems, where the quality and understandability of the underlying code is crucial.”

That warning frames the real stakes for Devstral 2. Mistral is not just offering another model that can generate code. It is betting that Devstral 2 can maintain coherence across whole projects, detect failures, retry with corrections, and handle more serious software engineering work than prototypes and in-house tools.

The company says the model can track framework dependencies and handle repository-scale tasks such as bug fixing and modernizing legacy systems. Those claims point to where AI coding tools are heading: away from isolated completions and toward agents that understand enough of a codebase to make coordinated changes.

The open question is how consistently that works outside a benchmark. For now, Devstral 2 gives open-weights AI coding a stronger position in a field still dominated by proprietary systems.