< Explain other AI papers

On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Yongliang Wu, Yizhou Zhou, Zhou Ziheng, Yingzhe Peng, Xinyu Ye, Xinting Hu, Wenbo Zhu, Lu Qi, Ming-Hsuan Yang, Xu Yang

2025-08-08

On the Generalization of SFT: A Reinforcement Learning Perspective with
  Reward Rectification

Summary

This paper talks about Dynamic Fine-Tuning (DFT), a new method that helps large language models learn better by changing how the model adjusts itself during training, leading to better results than regular Supervised Fine-Tuning (SFT).

What's the problem?

The problem is that typical Supervised Fine-Tuning often works in a fixed way where the model updates its knowledge the same across all parts, which may limit how well the model can adapt and generalize to new tasks or data.

What's the solution?

The solution was to develop DFT, which dynamically rescales the gradients — basically changing the strength of learning signals during training depending on the situation. This helps the model learn more effectively and improves its ability to generalize.

Why it matters?

This matters because better generalization means the language model can perform well on more different types of tasks or questions, making AI more useful and versatile in real-world applications.

Abstract

Dynamic Fine-Tuning (DFT) improves the generalization of Large Language Models (LLMs) by dynamically rescaling gradients, outperforming standard Supervised Fine-Tuning (SFT) and showing competitive results in offline reinforcement learning.