AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset

Ivan Moshkov, Darragh Hanley, Ivan Sorokin, Shubham Toshniwal, Christof Henkel, Benedikt Schifferer, Wei Du, Igor Gitman

2025-04-24

AIMO-2 Winning Solution: Building State-of-the-Art Mathematical
Reasoning Models with OpenMathReasoning dataset

Summary

This paper talks about how the authors built a top-performing AI system for solving hard math problems, like those you’d see in math olympiads, by using a huge collection of math questions and smart training techniques.

What's the problem?

The problem is that most AI models struggle with complex math reasoning, especially when the problems require many steps, creative thinking, or checking their own work, which is common in advanced math competitions.

What's the solution?

To solve this, the team put together a massive dataset of high-quality math problems and detailed solutions, including olympiad-level questions. They also invented a way for the AI to use computer code to help work through tough problems, training it to generate lots of possible solutions and pick the best one using a special selection process called GenSelect. This approach made their models much better at handling tricky math questions than older methods.

Why it matters?

This matters because it pushes the limits of what AI can do in math, helping students, teachers, and researchers tackle challenging problems. By sharing their code and data openly, they also make it easier for others to build on their work and keep improving AI’s math skills.

Abstract

This paper presents our winning submission to the AI Mathematical Olympiad - Progress Prize 2 (AIMO-2) competition. Our recipe for building state-of-the-art mathematical reasoning models relies on three key pillars. First, we create a large-scale dataset comprising 540K unique high-quality math problems, including olympiad-level problems, and their 3.2M long-reasoning solutions. Second, we develop a novel method to integrate code execution with long reasoning models through iterative training, generation, and quality filtering, resulting in 1.7M high-quality Tool-Integrated Reasoning solutions. Third, we create a pipeline to train models to select the most promising solution from many candidates. We show that such generative solution selection (GenSelect) can significantly improve upon majority voting baseline. Combining these ideas, we train a series of models that achieve state-of-the-art results on mathematical reasoning benchmarks. To facilitate further research, we release our code, models, and the complete OpenMathReasoning dataset under a commercially permissive license.

View Paper