OpenAI says one of its AI systems has completed a clean sweep at the International Collegiate Programming Contest (ICPC) World Finals 2025, solving all 12 tasks within the contest setting. The company says that score would have been enough for first place if the system had entered as a human team.
The result matters because ICPC is a demanding programming benchmark built around hard algorithmic problems, time pressure, and repeated attempts against an official judge. OpenAI’s claim is not only that its system performed well, but that it did so under the same basic conditions as student competitors.
A perfect run at ICPC
At the International Collegiate Programming Contest (ICPC) World Finals 2025, OpenAI’s system solved all 12 tasks. According to OpenAI, that would have put it ahead of every human team in the contest.
The company says the system received the problems in the standard PDF format. It also worked within the five-hour time limit and submitted solutions directly to an official ICPC judge. Those submissions were evaluated alongside human entries.
OpenAI emphasized that the contest setup was not specially modified for the system. That point is central to the claim: the system was not being tested on a customized version of the contest, but on the same problem set and through the same judging process described in the source.
The headline result is simple, but the details are important. A perfect score at this event means the system did not merely handle the easier or more familiar parts of the problem set. It reached accepted solutions for every task, including the final and hardest one.
How it compared with humans and Gemini
The source article places OpenAI’s result beside two other reference points: the best human team and Google Deepmind’s Gemini 2.5 Deep Think system.
The best human team solved 11 out of 12 problems. Google Deepmind said its upgraded Gemini 2.5 Deep Think system reached gold-medal level at the same competition by solving 10 out of 12 problems.
Gemini’s result included Problem C, which no human team managed to solve. But Gemini did not solve two other problems. OpenAI’s system, by contrast, reached 12 out of 12.
That comparison makes the performance notable in two ways. First, it exceeded the top human result by one problem. Second, it went beyond a strong rival AI system that had already been presented as reaching gold-medal level.
- OpenAI’s system solved all 12 tasks.
- The best human team solved 11 out of 12.
- Gemini 2.5 Deep Think solved 10 out of 12.
- Gemini solved Problem C, which no human team solved.
The result does not erase the importance of the human performance. It does, however, show that AI systems are now being evaluated against the same kind of high-pressure, multi-problem programming contests that have long been used to identify elite student programmers.
What OpenAI says powered the system
OpenAI says the system was not a single model acting alone. It was built as an ensemble of general-purpose reasoning models, and none of them were trained specifically for the ICPC.
The system used GPT-5 together with an internal experimental reasoning model. Both models generated candidate solutions. The experimental model then decided which solutions to submit.
GPT-5 handled most of the successful work. It produced correct answers for 11 of the 12 problems, and the experimental model selected those answers for submission.
The remaining task was different. OpenAI says the final and hardest problem was solved by the experimental model after GPT-5 struggled with it. That problem required a total of nine submissions before the solution was accepted.
This division of labor is one of the most useful details in the source. GPT-5 supplied most of the correct solutions, while the experimental reasoning model played both a selection role and, on the hardest problem, a direct problem-solving role.
OpenAI presents this as evidence that models able to reason more deeply and compute for longer can make progress on problems where other systems fail. The source also notes that GPT-5’s large share of the workload reinforces its position as one of the most capable AI models currently available to the public.
Why the contest result matters
Programming contests are a narrow setting, but they test several capabilities at once. A system must interpret a problem statement, design a correct algorithm, produce code, test its reasoning indirectly through submissions, and recover when a solution fails.
The ICPC format adds pressure because the work must happen within a five-hour window. The official judge accepts or rejects submitted solutions, so success depends on exact correctness rather than a plausible explanation.
For that reason, OpenAI’s result is best understood as a strong signal about competitive programming performance. It does not prove that the system can solve every real-world software engineering problem. The source does not make that claim, and neither should the conclusion.
What it does show is that OpenAI’s system could operate inside a demanding contest environment and reach accepted solutions across the full problem set. In that setting, it outperformed both the best human team and Google Deepmind’s Gemini 2.5 Deep Think result described in the source.
Part of a wider benchmark streak
OpenAI frames the ICPC performance as part of a broader pattern. The same models have already produced gold-level results at the International Mathematical Olympiad and the International Olympiad in Informatics.
Mostafa Rohaninejad, who worked on the project, described the ICPC result as a fitting conclusion to that run and pointed to the systems’ versatility. He also said the next frontier will be systems that can discover new knowledge, calling that the "true milestone."
That final point is the larger question behind the benchmark. Solving known contest problems under official conditions is a major technical result. Discovering new knowledge would be a different standard, and OpenAI’s own framing suggests that the company sees that as the next meaningful step.
For now, the ICPC result gives a concrete marker: OpenAI says a GPT-5-led ensemble solved all 12 tasks at the International Collegiate Programming Contest (ICPC) World Finals 2025, under contest conditions, with an official judge evaluating its submissions in parallel with human entries.