Dynamic Fine-Tuning (DFT) improves the generalization of Large Language Models (LLMs) by dynamically rescaling gradients, outperforming standard Supervised Fine-Tuning (SFT) and showing competitive results in offline reinforcement learning.

This paper talks about Dynamic Fine-Tuning (DFT), a new method that helps large language models learn better by changing how the model adjusts itself during training, leading to better results than regular Supervised Fine-Tuning (SFT).

On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Summary

What's the problem?

What's the solution?

Why it matters?

Abstract