SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu, Yuexiang Zhai, Jihan Yang, Shengbang Tong, Saining Xie, Dale Schuurmans, Quoc V. Le, Sergey Levine, Yi Ma
2025-01-29

Summary
This paper talks about how two different ways of training AI models, called Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), affect how well these models can learn and apply new information. The researchers tested these methods using a card game and a navigation task to see which one helps AI better understand and use new rules or situations.
What's the problem?
AI models are getting really good at specific tasks, but they often struggle when faced with new situations that are different from what they were trained on. Researchers weren't sure which training method, SFT or RL, was better at helping AI adapt to these new situations. They wanted to find out if one method just made the AI memorize information, while the other helped it truly understand and apply what it learned.
What's the solution?
The researchers created two test scenarios: a card game called GeneralPoints that tests math skills, and a navigation task called V-IRL. They trained AI models using both SFT and RL methods, then changed the rules or environment to see how well the models could adapt. They found that RL-trained models were much better at figuring out new situations, both in the card game and the navigation task. SFT-trained models, however, tended to just remember the exact situations they were trained on and struggled with changes.
Why it matters?
This research matters because it helps us create smarter, more flexible AI systems. If we use RL methods, we might be able to build AI that can handle real-world problems better, even when faced with new situations it hasn't seen before. This could lead to more useful AI in areas like self-driving cars, robots, or virtual assistants that can adapt to different user needs. However, the study also shows that SFT is still important as a first step before using RL, which could help developers create more effective training processes for AI models.
Abstract
Supervised fine-tuning (SFT) and reinforcement learning (RL) are widely used post-training techniques for foundation models. However, their roles in enhancing model generalization capabilities remain unclear. This paper studies the difference between SFT and RL on generalization and memorization, focusing on text-based rule variants and visual variants. We introduce GeneralPoints, an arithmetic reasoning card game, and adopt V-IRL, a real-world navigation environment, to assess how models trained with SFT and RL generalize to unseen variants in both textual and visual domains. We show that RL, especially when trained with an outcome-based reward, generalizes across both rule-based textual and visual variants. SFT, in contrast, tends to memorize training data and struggles to generalize out-of-distribution scenarios. Further analysis reveals that RL improves the model's underlying visual recognition capabilities, contributing to its enhanced generalization in the visual domain. Despite RL's superior generalization, we show that SFT remains essential for effective RL training; SFT stabilizes the model's output format, enabling subsequent RL to achieve its performance gains. These findings demonstrates the capability of RL for acquiring generalizable knowledge in complex, multi-modal tasks.