TechCrunch AI December 19, 2024 TERMINATOR

Google tests Gemini reasoning model in AI Studio

Google has released Gemini 2.0 Flash Thinking Experimental, an early reasoning AI model available in AI Studio. The model is built on Gemini 2.0 Flash and is meant for multimodal understanding, reasoning, coding, math and physics, but early testing showed clear room for improvement.

WTF Index TERMINATOR

◄ Terminator 1 Idiocracy 0 ►

A routine experimental reasoning-model launch mildly points toward more capable AI, but with no clear harm or control risk.

Google tests Gemini reasoning model in AI Studio

Google has entered the widening race to build reasoning AI models with Gemini 2.0 Flash Thinking Experimental, a new model now available in AI Studio. The release is framed as experimental, and early use suggests the technology is still uneven even as Google positions it around complex problem solving.

What Google Released

The model is called Gemini 2.0 Flash Thinking Experimental. It is available through AI Studio, Google's AI prototyping platform, and a model card describes it as "best for multimodal understanding, reasoning, and coding."

That same description says the model can "reason over the most complex problems" in areas including programming, math, and physics. In plain terms, Google is presenting the system as a version of Gemini that spends more effort working through a problem before it gives an answer.

Logan Kilpatrick, who leads product for AI Studio, described Gemini 2.0 Flash Thinking Experimental on X as "the first step in [Google's] reasoning journey." Jeff Dean, chief scientist for Google DeepMind, said in his own post that the model is "trained to use thoughts to strengthen its reasoning."

"We see promising results when we increase inference time computation," Dean said.

Inference time computation refers to the computing used to run the model while it considers a question. The point is not just that the model has been trained differently, but that it may also spend more compute during the answer process.

How Reasoning Models Differ

Gemini 2.0 Flash Thinking Experimental is built on Google's recently announced Gemini 2.0 Flash model. Its design appears similar to OpenAI's o1 and other reasoning models, according to the source article.

Reasoning models are meant to do more than produce a quick answer. They effectively fact-check themselves, which can help them avoid some issues that usually trip up AI models. That self-checking behavior is the core promise behind the category.

The tradeoff is speed. Reasoning models often take longer to respond, sometimes seconds to minutes longer, because they use more time to work through the problem before producing a final result.

In use, Gemini 2.0 Flash Thinking Experimental pauses before answering. It considers related prompts, explains its reasoning as it goes, and then summarizes what it treats as the most accurate answer.

That is the intended behavior. The early reality is less settled.

Early Testing Shows Limits

The model is still an early version, and the source article describes clear room for improvement. One test was especially simple: when asked how many R's were in the word "strawberry," Gemini 2.0 Flash Thinking Experimental answered "two."

That example matters because it shows the gap between a reasoning AI model's stated purpose and its actual behavior in a basic task. A system can be designed to slow down and check itself, yet still make an error that looks obvious to a human reader.

This does not erase the model's broader goal. It does show why the experimental label is important. Google's release is not being presented as a finished endpoint, but as an early step in a field that is moving quickly.

Kilpatrick also pointed to a challenging puzzle involving visual and textual clues as an example of the model's early capabilities. The source article does not claim that this proves the model is reliable across tasks; it presents the example alongside the caution that results may vary.

The Wider Reasoning AI Race

Google is not alone in pursuing reasoning models. After the release of o1, rival AI labs began releasing their own systems in the same general category.

In early November, DeepSeek, an AI research company funded by quant traders, launched a preview of its first reasoning model, DeepSeek-R1. That same month, Alibaba's Qwen team unveiled what it claimed was the first "open" challenger to o1.

Reports also indicate that Google had been working on this area before the public release. Bloomberg reported in October that several Google teams were developing reasoning models. The Information reported in November that the company has at least 200 researchers focused on the technology.

The source article connects this activity to a larger problem in generative AI: older brute force approaches to scaling up models are no longer delivering the same improvements they once did. Reasoning models offer one possible path for continued progress, especially on tasks where accuracy and multi-step problem solving matter.

What Comes Next

The promise of reasoning AI is straightforward: better answers on difficult questions in coding, math, physics, multimodal understanding, and other complex tasks. The challenge is equally clear: these models can be slower and more expensive to run because they require more computing power.

There is also uncertainty about whether the progress shown on benchmarks can continue at the same pace. Strong benchmark performance does not automatically settle how useful or dependable reasoning models will be in everyday use.

For Google, Gemini 2.0 Flash Thinking Experimental gives developers and AI builders a first look at where its reasoning work is heading inside AI Studio. For users, the practical takeaway is more cautious: this is a notable release, but still an experimental one.

The model's name signals ambition. Its early mistakes signal the hard work still ahead.