Competitive Programming with Large Reasoning Models
OpenAI, Ahmed El-Kishky, Alexander Wei, Andre Saraiva, Borys Minaev, Daniel Selsam, David Dohan, Francis Song, Hunter Lightman, Ignasi Clavera, Jakub Pachocki, Jerry Tworek, Lorenz Kuhn, Lukasz Kaiser, Mark Chen, Max Schwarzer, Mostafa Rohaninejad, Nat McAleese, o3 contributors, Oleg Mürk, Rhythm Garg, Rui Shu
2025-02-12

Summary
This paper talks about using advanced AI models, specifically large reasoning models like OpenAI's o1 and o3, to solve complex coding problems in competitive programming contests. The researchers tested these models against specialized systems and human competitors to see how well they perform.
What's the problem?
Competitive programming requires high-level problem-solving skills that are challenging for AI to master. Previous AI models struggled to match human performance in these contests, especially without using specialized strategies for each specific type of problem.
What's the solution?
The researchers used reinforcement learning to improve large language models, creating more general-purpose AI that can reason through complex problems. They tested different versions of these models, including OpenAI's o1 and o3, in real programming competitions like the International Olympiad in Informatics (IOI). They compared these general models to a specialized system designed specifically for the competition.
Why it matters?
This research matters because it shows that AI can now compete at a high level in complex problem-solving tasks without needing specialized programming for each type of problem. The o3 model even achieved a gold medal in the IOI and performed as well as top human competitors on Codeforces. This suggests that by improving general reasoning abilities in AI, we might be able to create systems that can tackle a wide range of complex problems in coding and potentially other fields, bringing us closer to more versatile and capable artificial intelligence.
Abstract
We show that reinforcement learning applied to large language models (LLMs) significantly boosts performance on complex coding and reasoning tasks. Additionally, we compare two general-purpose reasoning models - OpenAI o1 and an early checkpoint of o3 - with a domain-specific system, o1-ioi, which uses hand-engineered inference strategies designed for competing in the 2024 International Olympiad in Informatics (IOI). We competed live at IOI 2024 with o1-ioi and, using hand-crafted test-time strategies, placed in the 49th percentile. Under relaxed competition constraints, o1-ioi achieved a gold medal. However, when evaluating later models such as o3, we find that o3 achieves gold without hand-crafted domain-specific strategies or relaxed constraints. Our findings show that although specialized pipelines such as o1-ioi yield solid improvements, the scaled-up, general-purpose <PRE_TAG>o3 model</POST_TAG> surpasses those results without relying on hand-crafted inference heuristics. Notably, o3 achieves a gold medal at the 2024 IOI and obtains a Codeforces rating on par with elite human competitors. Overall, these results indicate that scaling general-purpose reinforcement learning, rather than relying on domain-specific techniques, offers a robust path toward state-of-the-art AI in reasoning domains, such as competitive programming.