Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Andrew Zhao, Yiran Wu, Yang Yue, Tong Wu, Quentin Xu, Yang Yue, Matthieu Lin, Shenzhi Wang, Qingyun Wu, Zilong Zheng, Gao Huang

2025-05-07

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Summary

This paper talks about Absolute Zero Reasoner (AZR), a new AI system that can teach itself to solve coding and math problems by practicing and checking its own work, without needing any outside data to learn from.

What's the problem?

Most AI models need lots of example problems and answers from humans to learn how to solve new tasks, which can be time-consuming and expensive to collect, especially for complex subjects like math and programming.

What's the solution?

The researchers built an AI that creates its own practice problems and solutions, then checks if its answers are correct, allowing it to get better at reasoning and problem-solving all on its own.

Why it matters?

This matters because it means AI can become smarter and more independent, learning new skills without needing tons of human-provided data, which could make advanced technology more accessible and powerful for everyone.

Abstract

Absolute Zero Reasoner (AZR) achieves state-of-the-art performance on coding and mathematical reasoning tasks through self-generated, verified learning without external data.

View Paper