RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning
Hongzhi Zhang, Jia Fu, Jingyuan Zhang, Kai Fu, Qi Wang, Fuzheng Zhang, Guorui Zhou
2025-07-17
Summary
This paper talks about RLEP, a new reinforcement learning method that improves how large language models learn to reason, especially on math problems, by reusing past experiences that show how to do well.
What's the problem?
The problem is that training large language models to get better at reasoning tasks is slow and often inefficient because they don’t always learn effectively from all the training examples.
What's the solution?
The authors designed a two-phase approach where the model first explores different reasoning paths and then repeatedly revisits the best ones, called experience replay, to reinforce learning from high-quality examples. This helps the model learn faster and perform better on complex math reasoning tasks.
Why it matters?
This matters because it makes AI models smarter and quicker at solving difficult reasoning problems, helping improve applications like math tutoring, scientific research, and any field that requires deep logical thinking.
Abstract
RLEP, a two-phase reinforcement learning framework with experience replay, enhances large language model training by focusing on high-quality trajectories, leading to faster convergence and improved performance on math datasets.