100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models
Chong Zhang, Yue Deng, Xiang Lin, Bin Wang, Dianwen Ng, Hai Ye, Xingxuan Li, Yao Xiao, Zhanfeng Mo, Qi Zhang, Lidong Bing
2025-05-01
Summary
This paper talks about what researchers have learned in the 100 days since DeepSeek-R1, especially by trying to repeat and study how reasoning language models are trained and improved.
What's the problem?
It's hard to know which training methods really make language models better at reasoning, because results can depend a lot on how the data is prepared and which techniques are used, making it tricky to improve these models reliably.
What's the solution?
The researchers looked at recent studies where people tried to repeat important experiments, paying close attention to how data is handled and how rewards are given during training. They also pointed out the main challenges and suggested ways to make future models even better.
Why it matters?
This matters because understanding what actually works in training reasoning models helps everyone build smarter, more reliable AI, which is important for things like education, science, and solving real-world problems.
Abstract
Recent replication studies explore the supervised fine-tuning and reinforcement learning from verifiable rewards of reasoning language models, focusing on data preparation and method design, and highlight challenges and potential enhancements.