A Survey of Reinforcement Learning for Large Reasoning Models
Kaiyan Zhang, Yuxin Zuo, Bingxiang He, Youbang Sun, Runze Liu, Che Jiang, Yuchen Fan, Kai Tian, Guoli Jia, Pengfei Li, Yu Fu, Xingtai Lv, Yuchen Zhang, Sihang Zeng, Shang Qu, Haozhan Li, Shijie Wang, Yuru Wang, Xinwei Long, Fangfu Liu, Xiang Xu, Jiaze Ma
2025-09-11
Summary
This paper is a comprehensive overview of how Reinforcement Learning (RL) is being used to improve the reasoning abilities of Large Language Models (LLMs), essentially turning them into 'Logical Reasoning Models' or LRMs.
What's the problem?
While using RL to boost LLM reasoning has been successful, especially in areas like math and coding, simply making models bigger isn't enough anymore. There are significant hurdles in terms of the sheer amount of computing power needed, designing effective RL algorithms for these complex models, getting enough good training data, and building the necessary infrastructure to support it all. The field needs a strategic look at how to scale up these methods if we want to eventually achieve Artificial Superintelligence.
What's the solution?
The authors thoroughly examine recent research that applies RL to LLMs and LRMs, focusing on work done since the release of DeepSeek-R1. They break down the key parts of this process – the foundational elements, the main challenges researchers are facing, the resources used for training, and how these improved models are being used in real-world applications. This review aims to pinpoint areas where future research can have the biggest impact.
Why it matters?
This work is important because it provides a roadmap for future development in this rapidly evolving field. By identifying the bottlenecks and opportunities in using RL to enhance LLM reasoning, it can help researchers focus their efforts and accelerate progress towards more powerful and intelligent AI systems. It's a crucial step in moving beyond simply having large language models to having models that can truly *think* and solve complex problems.
Abstract
In this paper, we survey recent advances in Reinforcement Learning (RL) for reasoning with Large Language Models (LLMs). RL has achieved remarkable success in advancing the frontier of LLM capabilities, particularly in addressing complex logical tasks such as mathematics and coding. As a result, RL has emerged as a foundational methodology for transforming LLMs into LRMs. With the rapid progress of the field, further scaling of RL for LRMs now faces foundational challenges not only in computational resources but also in algorithm design, training data, and infrastructure. To this end, it is timely to revisit the development of this domain, reassess its trajectory, and explore strategies to enhance the scalability of RL toward Artificial SuperIntelligence (ASI). In particular, we examine research applying RL to LLMs and LRMs for reasoning abilities, especially since the release of DeepSeek-R1, including foundational components, core problems, training resources, and downstream applications, to identify future opportunities and directions for this rapidly evolving area. We hope this review will promote future research on RL for broader reasoning models. Github: https://github.com/TsinghuaC3I/Awesome-RL-for-LRMs