SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Hardy Chen, Haoqin Tu, Fali Wang, Hui Liu, Xianfeng Tang, Xinya Du, Yuyin Zhou, Cihang Xie

2025-04-17

SFT or RL? An Early Investigation into Training R1-Like Reasoning Large
Vision-Language Models

Summary

This paper talks about whether it's better to use supervised fine-tuning or reinforcement learning when training large AI models that can understand both images and language, especially for tasks that require real reasoning.

What's the problem?

The problem is that most current models are trained to imitate examples given by humans, which means they often just copy patterns instead of truly thinking through problems. This can limit their ability to come up with original solutions or reason through new situations, especially in tasks that need more than just memorizing answers.

What's the solution?

The researchers compared the usual supervised fine-tuning approach, where the model learns by copying answers, with a new reinforcement learning method that rewards the model for actually figuring things out on its own. They found that reinforcement learning helped the AI develop real reasoning skills and perform better than models trained just to imitate.

Why it matters?

This matters because if AI models can learn to reason instead of just copying, they’ll be much more useful and trustworthy in real-world situations, like solving problems, making decisions, or helping people in new and unexpected ways.

Abstract

Supervised fine-tuning can hinder reinforcement learning in Large Vision-Language Models by inducing imitative reasoning, whereas a novel reinforcement learning approach improves genuine reasoning capabilities and outperforms existing models.

View Paper