ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Mingjie Liu, Shizhe Diao, Ximing Lu, Jian Hu, Xin Dong, Yejin Choi, Jan Kautz, Yi Dong

2025-06-02

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in
Large Language Models

Summary

This paper talks about ProRL, a method that trains language models for a longer time using reinforcement learning, which helps these models come up with new and better ways to solve problems and reason through tough questions.

What's the problem?

The problem is that even though language models are good at answering questions, their reasoning skills can be limited, and they might not always find the smartest or most creative solutions, especially on more complex tasks.

What's the solution?

The researchers used prolonged reinforcement learning, which means they kept training the models for longer periods, rewarding them for finding better reasoning strategies. This extra training helped the models discover new ways to think through problems and improved their performance compared to models that didn’t get this extra practice.

Why it matters?

This is important because it shows that with the right kind of training, AI can keep getting smarter and more creative, which could make them even more helpful for things like tutoring, research, and solving real-world problems.

Abstract

Prolonged reinforcement learning training (ProRL) uncovers novel reasoning strategies in language models, outperforming base models and suggesting meaningful expansion of reasoning capabilities.

View Paper