TechCrunch AI July 22, 2025 TERMINATOR

Gold-Medal AI Math Scores Put OpenAI and Google Neck and Neck

OpenAI and Google DeepMind both said their AI models reached gold-medal scores in the 2025 International Math Olympiad. The results point to fast progress in AI reasoning, while also exposing a dispute over how OpenAI announced and evaluated its performance.

WTF Index TERMINATOR

◄ Terminator 2 Idiocracy 0 ►

Gold-medal IMO performance signals faster progress toward more powerful AI reasoning, but the story does not describe direct harm or autonomy.

Gold-Medal AI Math Scores Put OpenAI and Google Neck and Neck

OpenAI and Google DeepMind have both claimed gold-medal scores at the 2025 International Math Olympiad, putting two of the most watched AI labs in a rare head-to-head moment on a famously difficult math benchmark.

The achievement matters because the IMO is not just another test. It is one of the world’s oldest and most challenging high school-level math competitions, and it asks for proof-based answers rather than simple final results. For AI companies trying to show progress in reasoning, that makes the contest especially powerful.

Why the 2025 IMO Result Matters

Both OpenAI and Google said their systems correctly answered five out of six questions on the IMO test. That was enough for gold-medal-level performance, according to the companies, and higher than most high school students in the competition.

The result also marks a shift from last year. Google previously scored a silver medal at IMO using a “formal” system, which required people to translate the problems into a machine-readable format. This year, the companies used “informal” systems that could read the questions and produce natural-language, proof-based answers without that human-machine translation step.

That distinction is central to the story. A system that needs problems rewritten for it can still be impressive, but it is operating in a more controlled pipeline. A system that can take natural language questions and return proof-based answers is closer to the way people engage with the contest itself.

Researchers from both companies told TechCrunch that the performances represented breakthroughs for AI reasoning models in non-verifiable domains. The phrase matters because many AI systems already do well when the answer can be checked cleanly, such as in simple math or coding tasks. More open-ended work, including complex research or subjective decisions, remains harder.

A Benchmark With Real Symbolic Weight

The IMO carries unusual importance in the AI race because competitive math has deep ties to the research community. Many AI researchers come from competitive math backgrounds, so a strong IMO result has more meaning inside the field than many public benchmarks.

It also feeds into the public contest over which lab appears to be ahead. OpenAI and Google are competing not only on products and model performance, but also on perception. TechCrunch described that as an intangible battle of “vibes” that can affect how companies attract top AI talent.

On the substance, the two companies looked closely matched. Both claimed five correct answers out of six. Both said their systems reached gold-medal scores. Both framed the result as evidence that AI reasoning models are improving quickly.

That closeness may be the most important takeaway. OpenAI has been seen as having a significant lead over the industry, but the IMO results suggest the race is now tighter than any company may want to admit.

The Dispute Over Process

The technical milestone quickly turned into a fight over evaluation and timing. Google DeepMind’s CEO and researchers criticized OpenAI on social media after OpenAI announced its result on Saturday morning, shortly after IMO had announced which high schoolers had won the competition on Friday night.

Demis Hassabis wrote, “Btw as an aside, we didn’t announce on Friday because we respected the IMO Board's original request that all AI labs share their results only after the official results had been verified by independent experts & the students had rightly received the acclamation they deserved”.

Thang Luong, a Google DeepMind senior researcher and lead for the IMO project, told TechCrunch that Google waited to announce its results because it wanted to respect the students in the competition. He also said Google had been working with IMO’s organizers since last year and wanted the IMO president’s blessing and official grading before announcing.

Luong’s objection was about authority as much as etiquette. He told TechCrunch, “The IMO organizers have their grading guideline,” and added, “So any evaluation that’s not based on that guideline could not make any claim about gold-medal level [performance].”

OpenAI gave a different account of its route into the contest. Noam Brown, a senior OpenAI researcher who worked on the IMO model, said IMO had reached out a few months ago about participating in a formal math competition. OpenAI declined because it was focused on natural language systems that it considered more worth pursuing.

Brown also said OpenAI did not know IMO was conducting an informal test with Google. OpenAI said it hired third-party evaluators, specifically three former IMO medalists who understood the grading system, to grade its model’s performance. After learning it had reached a gold-medal score, Brown said OpenAI contacted IMO, which then told the company to wait until after the Friday night award ceremony before announcing.

IMO did not respond to TechCrunch’s request for comment.

What This Says About AI Reasoning

The disagreement does not erase the larger point: leading AI labs are moving quickly on reasoning. Even if Google went through a more official and rigorous process, the fact that multiple systems are now claiming this level of performance is itself significant.

The source article frames the result against the students who competed this year. Countries from around the world sent their brightest students to IMO, and only a few percent of them scored as well as the AI models from OpenAI and Google.

That comparison should be treated carefully. The human students and the AI systems are not the same kind of participant, and the debate over evaluation shows why process matters. Still, the performance level is hard to ignore.

For OpenAI, the result arrives as the company is expected to release GPT-5 in the coming months. The company wants to project continued leadership in the AI industry. For Google, the officially graded result offers a way to argue that it is not behind and may be matching OpenAI in one of the most symbolically important areas of AI progress.

The practical lesson is that AI reasoning is becoming more competitive at the top. The public argument over who announced first and who followed the strongest grading process may continue, but the underlying signal is already clear: OpenAI and Google DeepMind are now close enough on elite math performance that neither can easily claim uncontested dominance.