SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning
Yuqian Fu, Tinghong Chen, Jiajun Chai, Xihuai Wang, Songjun Tu, Guojun Yin, Wei Lin, Qichao Zhang, Yuanheng Zhu, Dongbin Zhao
2025-06-25
Summary
This paper talks about SRFT, a new method that combines supervised fine-tuning and reinforcement learning into one step to make language models better at reasoning.
What's the problem?
The problem is that language models can learn from examples or from trial and error, but combining these two learning methods smoothly and effectively has been a big challenge.
What's the solution?
The researchers created SRFT, which uses a technique called entropy-aware weighting to balance learning from both examples and exploration at the same time, improving the model’s ability to reason and solve problems in one training process.
Why it matters?
This matters because it helps build smarter AI models that can understand and solve complex reasoning tasks more accurately and efficiently, which can be used in many applications like math problem-solving and decision-making.
Abstract
Supervised Reinforcement Fine-Tuning (SRFT) integrates Supervised Fine-Tuning and Reinforcement Learning through entropy-aware weighting to achieve high accuracy in language model optimization.