SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe

Yuxin Xiao, Shujian Zhang, Wenxuan Zhou, Marzyeh Ghassemi, Sanqiang Zhao

2024-10-13

SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe

Summary

This paper introduces SFTMix, a new method that improves how large language models (LLMs) learn to follow instructions by using a technique called Mixup.

What's the problem?

When training LLMs to perform tasks based on instructions, traditional methods often require high-quality datasets, which can be expensive and time-consuming to create. These methods typically use a process called next-token prediction (NTP), but they don't fully take advantage of the data's potential, leading to high costs and limited performance improvements.

What's the solution?

To address this issue, the authors developed SFTMix, which enhances instruction-tuning performance without needing perfectly curated datasets. They realized that LLMs show different levels of confidence when processing information, and that examples with varying confidence should be treated differently during training. SFTMix uses Mixup, a technique that blends existing examples to create new training data. This helps the model learn better by reducing overfitting on confident examples while improving its ability to understand less confident ones. As a result, SFTMix significantly outperforms traditional methods across various tasks.

Why it matters?

This research is important because it provides a more efficient way to train LLMs for instruction-following tasks, making it easier and cheaper to develop high-performing AI systems. By improving how these models learn from data, SFTMix could lead to better applications in areas like virtual assistants, customer support, and any task that requires understanding and following complex instructions.

Abstract

To induce desired behaviors in large language models (LLMs) for interaction-driven tasks, the instruction-tuning stage typically trains LLMs on instruction-response pairs using the next-token prediction (NTP) loss. Previous work aiming to improve instruction-tuning performance often emphasizes the need for higher-quality supervised fine-tuning (SFT) datasets, which typically involves expensive data filtering with proprietary LLMs or labor-intensive data generation by human annotators. However, these approaches do not fully leverage the datasets' intrinsic properties, resulting in high computational and labor costs, thereby limiting scalability and performance gains. In this paper, we propose SFTMix, a novel recipe that elevates instruction-tuning performance beyond the conventional NTP paradigm, without the need for well-curated datasets. Observing that LLMs exhibit uneven confidence across the semantic representation space, we argue that examples with different confidence levels should play distinct roles during the instruction-tuning process. Based on this insight, SFTMix leverages training dynamics to identify examples with varying confidence levels, then applies a Mixup-based regularization to mitigate overfitting on confident examples while propagating supervision signals to improve learning on relatively unconfident ones. This approach enables SFTMix to significantly outperform NTP across a wide range of instruction-following and healthcare domain-specific SFT tasks, demonstrating its adaptability to diverse LLM families and scalability to datasets of any size. Comprehensive ablation studies further verify the robustness of SFTMix's design choices, underscoring its versatility in consistently enhancing performance across different LLMs and datasets in broader natural language processing applications.

View Paper