TechCrunch AI March 25, 2025 TERMINATOR

Google pushes AI reasoning forward with Gemini 2.5 Pro

Google has introduced Gemini 2.5, a new family of AI reasoning models that can pause to "think" before responding. The first release, Gemini 2.5 Pro Experimental, is multimodal, available through Google AI Studio and Gemini Advanced, and positioned as Google's most intelligent model yet.

WTF Index TERMINATOR

◄ Terminator 2 Idiocracy 0 ►

A routine model launch, but stronger reasoning capabilities aimed at future AI agents mildly push toward more powerful and autonomous AI systems.

Google pushes AI reasoning forward with Gemini 2.5 Pro

Google is making reasoning a central part of its AI strategy with Gemini 2.5, a new model family designed to spend more time working through a question before producing an answer. The first model in the lineup is Gemini 2.5 Pro Experimental, a multimodal reasoning model that Google describes as its most intelligent model yet.

The launch puts Google more directly into the fast-moving race around AI reasoning models, a category that has become important for coding, math, and future AI agents. It also signals a broader shift inside Google: the company says all of its new AI models will include reasoning capabilities going forward.

What Google Released

Gemini 2.5 is a new family of AI reasoning models. The defining idea is that the model can pause to "think" before answering, using additional time and computing power to reason through a problem.

Google is starting the family with Gemini 2.5 Pro Experimental. It is a multimodal model, meaning it is built to handle more than one type of input, and it is being introduced as the company's most capable model so far.

The model is available on Tuesday in two places: Google's developer platform, Google AI Studio, and the Gemini app for subscribers to Gemini Advanced. Gemini Advanced is the company's $20-a-month AI plan.

This release follows an earlier Google experiment with reasoning. The company previously released a "thinking" version of Gemini in December, but Gemini 2.5 is being framed as a more serious effort to compete with OpenAI's "o" series of models.

Why Reasoning Models Matter

The wider technology industry has moved quickly into AI reasoning since OpenAI launched o1, its first AI reasoning model, in September 2024. Since then, Anthropic, DeepSeek, Google, and xAI have all introduced models in the same category.

Reasoning models differ from standard AI models because they use extra computing power and time before they answer. That additional work is intended to help them fact-check and reason through complex problems more carefully.

The source article highlights two areas where reasoning techniques have already mattered: math and coding. These are tasks where a model often needs to move through multiple steps, revise an approach, or connect several pieces of information before producing a useful response.

Many people in the tech world also see reasoning models as a key part of AI agents. These agents are described as autonomous systems that can complete tasks largely without human intervention. Better reasoning could make those systems more capable, though the article also notes a tradeoff: reasoning models are more expensive.

Benchmarks Show Strengths And Limits

Google says Gemini 2.5 Pro outperforms its previous frontier AI models and some major competing models on several benchmarks. The company specifically says Gemini 2.5 was designed to do well at creating visually compelling web apps and agentic coding applications.

On Aider Polyglot, an evaluation focused on code editing, Google says Gemini 2.5 Pro scores 68.6%. According to Google, that result beats top AI models from OpenAI, Anthropic, and Chinese AI lab DeepSeek.

The picture is more mixed on SWE-bench Verified, another test that measures software dev abilities. Gemini 2.5 Pro scores 63.8% on that benchmark. That puts it ahead of OpenAI's o3-mini and DeepSeek's R1, but behind Anthropic's Claude 3.7 Sonnet, which scored 70.3%.

Gemini 2.5 Pro also appears on Humanity's Last Exam, a multimodal test made up of thousands of crowdsourced questions across mathematics, humanities, and the natural sciences. Google says the model scores 18.8% there, performing better than most rival flagship models.

Taken together, the benchmarks present Gemini 2.5 Pro as a strong reasoning model, especially in coding-related evaluations. They also show that the competitive landscape remains uneven, with different models leading on different tests.

A Larger Context Window

One of Gemini 2.5 Pro's most notable technical features is its context window. Google says the model is launching with a 1 million token context window, which allows it to process roughly 750,000 words in one pass.

The source compares that input length to more than the entire "Lord of The Rings" book series. In practical terms, a larger context window gives a model more room to take in long documents, large codebases, or complex collections of material before generating a response.

Google also says Gemini 2.5 Pro will soon support 2 million tokens, doubling the initial input length. The article does not provide a specific date for that expansion.

Pricing is also still open. Google has not published API pricing for Gemini 2.5 Pro, and the company says it will share more in the coming weeks.

What This Means For The AI Race

Gemini 2.5 Pro arrives at a moment when reasoning models are becoming a central competitive focus for major AI labs. OpenAI, Anthropic, DeepSeek, Google, and xAI are all now part of that contest.

For Google, the launch is about more than a single experimental model. By saying that future AI models will have reasoning capabilities built in, the company is making reasoning part of its default model roadmap.

The immediate impact will likely be clearest for developers and Gemini Advanced subscribers who can access Gemini 2.5 Pro Experimental. For developers, the emphasis on code editing, software development, visually compelling web apps, and agentic coding applications shows where Google wants the model to stand out.

The broader question is how much reasoning improves real-world usefulness, especially when the models cost more to run. Gemini 2.5 Pro gives Google a new answer to that question, but the benchmark results also show that no single model is clearly ahead everywhere.