OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique

Wasi Uddin Ahmad, Somshubra Majumdar, Aleksander Ficek, Sean Narenthiran, Mehrzad Samadi, Jocelyn Huang, Siddhartha Jain, Vahid Noroozi, Boris Ginsburg

2025-07-16

OpenCodeReasoning-II: A Simple Test Time Scaling Approach via
Self-Critique

Summary

This paper talks about OpenCodeReasoning-II, a large dataset designed to help AI models get better at writing and reviewing computer code, especially for competitive programming tasks.

What's the problem?

The problem is that existing code datasets for teaching AI models often lack detailed reasoning and critique, which limits the models’ ability to understand complex coding problems fully and improve their own solutions.

What's the solution?

The authors created OpenCodeReasoning-II, which contains 2.5 million sets of programming questions, AI-generated solutions, and detailed critiques of those solutions. They used a two-step training process where the AI first learns to generate code and then learns to critique code, improving its performance significantly. They also extended a coding benchmark called LiveCodeBench to better test models in C++.

Why it matters?

This matters because it helps make AI models smarter at programming by teaching them not just to write code but also to think critically about it, which leads to better, more accurate solutions and advances AI’s ability to assist with real-world coding challenges.

Abstract

OpenCodeReasoning-II, a large dataset for code reasoning, enhances code generation and critique through a two-stage fine-tuning strategy, improving competitive coding performance and extending LiveCodeBench for C++.

View Paper