Ars Technica AI March 26, 2025 TERMINATOR

Why Gemini 2.5 Pro raises the bar for Google's AI

Google says Gemini 2.5 Pro Experimental is its “most intelligent” model yet, with reasoning built in, multimodal input, and a 1 million token context window. It is available now in the Gemini app, on the web, and in Google’s AI Studio, with Vertex AI access coming soon.

WTF Index TERMINATOR

◄ Terminator 2 Idiocracy 1 ►

The story is mostly a routine model upgrade, but it mildly leans Terminator because it emphasizes stronger reasoning, long-context capability, and agentic coding power.

Why Gemini 2.5 Pro raises the bar for Google's AI

Google is moving quickly from Gemini 2.0 to Gemini 2.5 Pro Experimental, and the upgrade is not just a branding change. The company is positioning the new model as its “most intelligent” system yet, built around reasoning, multimodality, a very large context window, and stronger performance on coding, math, and science tasks.

The result is a model that looks important for both everyday Gemini users and developers watching the next phase of large language models. The clearest story is simple: Gemini 2.5 Pro Experimental can take in far more context than many competing systems, produce long outputs, and handle complex prompts with stronger results than earlier Gemini 2.0 models.

What Google changed with Gemini 2.5 Pro

Gemini 2.5 Pro Experimental arrives only a few months after Google released its first Gemini 2.0 AI models. According to Google, all of its models going forward will have reasoning built in, and Gemini 2.5 Pro is the first major example of that shift in this release cycle.

The source describes this as a system that checks its own work while generating an answer. It is not presented as human reasoning, and the article specifically uses the term “simulated reasoning” to make that distinction. Still, the practical goal is clear: better outputs from a model that can work through harder prompts before responding.

Google highlights coding as one area where this matters. The company cites the model’s “agentic” coding capabilities, and the source says Gemini 2.5 Pro Experimental can generate a full working video game from a single prompt. That claim was tested with the publicly available version of the model and was reported to work.

For users, this matters because coding tasks often require a model to keep track of several moving parts at once: the original request, the structure of the program, the logic of the interaction, and the final result. A model that performs better on that kind of task may also be more useful for other multi-step work, including technical explanation, structured planning, and complex question answering.

The context window is one of the biggest technical advantages

One of the most concrete specifications in Gemini 2.5 Pro Experimental is its 1 million token context window. The source notes that this is common for the big Gemini models but still massive compared with competing models like OpenAI GPT or Anthropic Claude.

A large context window means the model can receive much more information in a single prompt. The article gives a plain example: a user could feed multiple very long books to Gemini 2.5 Pro in one prompt. That does not guarantee perfect understanding, but it changes what kinds of tasks are practical.

The output limit is also large. Gemini 2.5 Pro Experimental maxes out at 64,000 tokens, the same as Flash 2.0. The source describes that as objectively a lot of tokens compared with other LLMs.

Google also says the context window will be increased to 2 million tokens soon. That planned expansion is important because the source already describes the current limit as about five times the size of o3-mini’s input limit. If Google follows through, Gemini’s advantage on very long inputs could become even more visible.

Benchmarks and user preference both point upward

Google has run Gemini 2.5 Pro Experimental through a set of benchmarks, and the source says those results put it slightly ahead of other AI systems in several areas. In GPQA and AIME 2025, which measure complex science and math question answering, Gemini 2.5 Pro Experimental edges past OpenAI’s o3-mini.

The model also set a new record in Humanity’s Last Exam, a benchmark made of 3,000 questions curated by domain experts. Gemini 2.5 Pro Experimental scored 18.8 percent, compared with OpenAI’s 14 percent.

Those numbers are notable, but the source also cautions that objective measures of AI capability are hard to interpret. Benchmarks can help compare systems, yet they do not always capture how useful a model feels in real work.

That is where user preference enters the picture. Gemini 2.5 Pro Experimental is already at the top of the LMSYS Chatbot arena leaderboard. According to the source, that means users generally prefer its output over responses from OpenAI o3-mini, Grok, DeepSeek, and others.

The article’s own testing also supports the idea that this is a meaningful upgrade. Tasks that often confused Gemini 2.0 models were handled better by Gemini 2.5 Pro Experimental, with coding, math, and science questions trending stronger than previous Gemini versions.

Access, limits, and what comes next

Gemini 2.5 Pro Experimental is already available in several places. Users can access it in the mobile app and on the web, as well as in Google’s AI Studio. It will be in Vertex AI soon.

Google says Gemini 2.5 Pro is a drop-in replacement for 2.0 across Google’s products for anyone with a Gemini Advanced subscription. The subscription is listed at $20 per month.

There are limits, though. Google has not yet announced API pricing for Gemini 2.5 Pro Experimental. For now, the model has the same 50-message daily limit as older experimental models, and it is free for the moment.

That free status is not permanent. Google’s Logan Kilpatrick said on X, formerly Twitter, that 2.5 Pro Experimental will be the first experimental model with higher API limits and pricing. A later announcement is expected for those details.

The larger implication is that Google is treating this experimental model as more than a small preview. Between the reasoning focus, the 1 million token context window, the planned move to 2 million tokens, benchmark gains, and broader product availability, Gemini 2.5 Pro Experimental is a major step in Google’s current AI push.