UFT: Unifying Supervised and Reinforcement Fine-Tuning

Mingyang Liu, Gabriele Farina, Asuman Ozdaglar

2025-05-27

UFT: Unifying Supervised and Reinforcement Fine-Tuning

Summary

This paper talks about Unified Fine-Tuning (UFT), a new way to improve how large language models are trained after their initial learning, by blending the strengths of both supervised and reinforcement fine-tuning methods.

What's the problem?

The problem is that supervised fine-tuning, where models learn from examples with correct answers, and reinforcement fine-tuning, where models learn from feedback or rewards, each have their own weaknesses. Supervised fine-tuning can make models too focused on specific examples, while reinforcement fine-tuning can be slow and sometimes unstable. This makes it hard for language models to learn quickly and handle new situations well.

What's the solution?

The authors created UFT, which brings together supervised and reinforcement fine-tuning in a unified approach. This method lets the model learn from both direct examples and feedback, helping it generalize better and learn faster than using either method alone.

Why it matters?

This is important because it means language models can become smarter, adapt to new tasks more easily, and improve their performance in a shorter amount of time. This could lead to better AI helpers for things like writing, research, and problem-solving.

Abstract

A new post-training method, Unified Fine-Tuning (UFT), improves upon supervised and reinforcement fine-tuning for large language models by combining their benefits, achieving better generalization and faster convergence.

View Paper