An Empirical Study on Eliciting and Improving R1-like Reasoning Models
Zhipeng Chen, Yingqian Min, Beichen Zhang, Jie Chen, Jinhao Jiang, Daixuan Cheng, Wayne Xin Zhao, Zheng Liu, Xu Miao, Yang Lu, Lei Fang, Zhongyuan Wang, Ji-Rong Wen
2025-03-10
Summary
This paper talks about improving AI models that can think and reason like humans, focusing on a technique called reinforcement learning (RL) and how it can make these models better at solving complex problems
What's the problem?
AI models are getting really good at many tasks, but they still struggle with complex reasoning that requires careful, step-by-step thinking. Researchers want to make these models 'think slower' like humans do when tackling difficult problems, but it's not clear how to do this effectively
What's the solution?
The researchers tried different ways to train AI models using reinforcement learning, which is like teaching the AI by letting it practice and learn from its mistakes. They tested this on various models, including some that were already pretty good at reasoning. They also experimented with giving the AI access to tools, which helped it solve problems even better. One of their improved models got really good at solving tough math problems from a competition called AIME
Why it matters?
This matters because it shows we can make AI 'smarter' at solving complex problems, which could help in fields like science, medicine, or engineering where careful reasoning is crucial. By making AI think more like humans when tackling hard questions, we might be able to use these models to help solve real-world problems that require deep thought and analysis. The fact that they're sharing their work openly also means other researchers can build on these findings, potentially leading to even smarter AI in the future
Abstract
In this report, we present the third technical report on the development of slow-thinking models as part of the STILL project. As the technical pathway becomes clearer, scaling RL training has become a central technique for implementing such reasoning models. We systematically experiment with and document the effects of various factors influencing RL training, conducting experiments on both base models and fine-tuned models. Specifically, we demonstrate that our RL training approach consistently improves the Qwen2.5-32B base models, enhancing both response length and test accuracy. Furthermore, we show that even when a model like DeepSeek-R1-Distill-Qwen-1.5B has already achieved a high performance level, it can be further refined through RL training, reaching an accuracy of 39.33% on AIME 2024. Beyond RL training, we also explore the use of tool manipulation, finding that it significantly boosts the reasoning performance of large reasoning models. This approach achieves a remarkable accuracy of 86.67% with greedy search on AIME 2024, underscoring its effectiveness in enhancing model capabilities. We release our resources at the STILL project website: https://github.com/RUCAIBox/Slow_Thinking_with_LLMs.