TechCrunch AI February 7, 2025 TERMINATOR

DeepMind AI clears a geometry benchmark built from IMO problems

Google DeepMind says AlphaGeometry2 solved 42 out of 50 translated geometry problems from International Mathematical Olympiad competitions. The result highlights a hybrid AI approach that combines a Gemini language model with a rules-based symbolic engine.

WTF Index TERMINATOR

◄ Terminator 1 Idiocracy 0 ►

The story mainly signals AI becoming more capable at formal reasoning, but with no clear danger or societal degradation angle.

DeepMind AI clears a geometry benchmark built from IMO problems

Google DeepMind says its AlphaGeometry2 system has reached a notable mark in mathematical reasoning: solving geometry problems from the International Mathematical Olympiad at a level above the average gold medalist score.

The result matters because the system is not only producing answers. It is searching for formal proofs, using a mix of neural network prediction and rule-based symbolic reasoning to decide which steps can logically solve a geometry theorem.

What AlphaGeometry2 Solved

AlphaGeometry2 is an improved version of AlphaGeometry, a system DeepMind released last January. In a newly published study, the researchers behind the system claim it can solve 84% of all geometry problems from the last 25 years of the International Mathematical Olympiad, a math contest for high school students.

For its main evaluation, the DeepMind team selected 45 geometry problems from IMO competitions over the past 25 years, from 2000 to 2024. Those problems included linear equations and equations that require moving geometric objects around a plane.

The team then translated those problems into a larger set of 50 problems. Some problems had to be split into two for technical reasons.

According to the paper, AlphaGeometry2 solved 42 out of the 50 problems. That was above the average gold medalist score of 40.9.

Why Geometry Is a Test for AI Reasoning

DeepMind’s interest in a high-school-level math contest is tied to a broader AI question. The lab believes that solving challenging Euclidean geometry problems could help point toward more capable AI systems.

Mathematical theorem proving demands more than pattern matching. A system must choose useful steps from many possible options, apply formal rules, and explain why a theorem is true. In the source example, the Pythagorean theorem is the kind of result that requires logical explanation, not just a final answer.

Those skills could become useful components of future general-purpose AI models if DeepMind’s view is correct. The same kinds of approaches could also be extended to other areas of math and science, including complex engineering calculations.

DeepMind has already shown one example of this direction. This past summer, it demoed a system that combined AlphaGeometry2 with AlphaProof, an AI model for formal math reasoning, to solve four out of six problems from the 2024 IMO.

How the System Works

AlphaGeometry2 combines several parts. One is a language model from Google’s Gemini family of AI models. Another is a symbolic engine that uses mathematical rules to infer solutions.

The geometry problems AlphaGeometry2 tackles are based on diagrams. Before a proof can be found, the diagram may need extra constructs, such as points, lines, or circles. The Gemini model predicts which constructs might help.

The symbolic engine then uses those suggested constructs while following mathematical rules. In plain terms, the Gemini model proposes formal steps and additions to the diagram, and the symbolic engine checks whether those steps are logically consistent.

A search algorithm lets AlphaGeometry2 run multiple searches for solutions in parallel. It can also store potentially useful findings in a shared knowledge base.

The system counts a problem as solved when it reaches a proof that combines the Gemini model’s suggestions with the symbolic engine’s known principles.

Training Data Was a Major Challenge

Geometry proofs are difficult to turn into a format AI can use. Because of that, there is not much usable geometry training data available.

DeepMind addressed that gap by creating synthetic data to train AlphaGeometry2’s language model. The team generated over 300 million theorems and proofs of varying complexity.

That synthetic data helped the system learn how to suggest useful constructions and proof steps. But the final reasoning still depends on the symbolic engine’s rule-based checking.

This is central to why AlphaGeometry2 is interesting. It is neither a purely neural system nor a purely symbolic one. It uses a Gemini neural network model for suggestions and a rules-based engine for formal deduction.

Limits and the Bigger Debate

The result does not mean AlphaGeometry2 can solve every hard geometry problem. The source describes several limitations. A technical quirk prevents the system from solving problems with a variable number of points, nonlinear equations, and inequalities.

It also performed less strongly on another harder set. For that test, the DeepMind team selected 29 problems that had been nominated for IMO exams by math experts but had not yet appeared in a competition. AlphaGeometry2 solved 20 of those.

The study is also part of a larger debate over how AI systems should be built. One side emphasizes neural networks, which learn from data and statistical approximation. Another side argues for symbolic systems, which manipulate symbols that represent knowledge using defined rules.

AlphaGeometry2 sits between those views. Its Gemini component uses a neural network architecture, while its symbolic engine is rules-based.

Vince Conitzer, a Carnegie Mellon University computer science professor specializing in AI, told TechCrunch: "It is striking to see the contrast between continuing, spectacular progress on these kinds of benchmarks, and meanwhile, language models, including more recent ones with ‘reasoning,’ continuing to struggle with some simple commonsense problems," adding, "I don’t think it’s all smoke and mirrors, but it illustrates that we still don’t really know what behavior to expect from the next system. These systems are likely to be very impactful, so we urgently need to understand them and the risks they pose much better."

According to the DeepMind paper, OpenAI’s o1 "reasoning" model, which also has a neural network architecture, could not solve any of the IMO problems that AlphaGeometry2 was able to answer.

DeepMind also reported preliminary evidence that AlphaGeometry2’s language model could generate partial solutions without the symbolic engine. Still, the team wrote that until model speed improves and hallucinations are completely resolved, tools like symbolic engines will remain essential for math applications.