R-Zero: Self-Evolving Reasoning LLM from Zero Data
Chengsong Huang, Wenhao Yu, Xiaoyang Wang, Hongming Zhang, Zongxia Li, Ruosen Li, Jiaxin Huang, Haitao Mi, Dong Yu
2025-08-08
Summary
This paper talks about R-Zero, a new system where two AI models work together to improve themselves without any human help. One model creates challenging questions while the other tries to solve them, and they keep learning from each other to get better.
What's the problem?
The problem is that training large language models usually needs a lot of human-designed tasks and labels, which takes a lot of effort and limits how fast the AI can improve on its own.
What's the solution?
The solution was to make R-Zero, which starts with one base model and splits it into two parts: a Challenger that generates hard tasks close to what the Solver can handle, and a Solver that works on those tasks. They continue to challenge and learn from each other in a cycle, improving their reasoning skills without any pre-made training data.
Why it matters?
This matters because it shows a way for AI to teach itself and get smarter without relying on humans to create tasks, which could help develop more advanced and capable AI systems in the future.
Abstract
R-Zero is a self-evolving framework that autonomously generates and learns from its own training data, improving reasoning capabilities in LLMs without human-curated tasks.