Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key

Yingda Chen, Xingjun Wang, Jintao Huang, Yunlin Mao, Daoze Zhang, Yuze Zhao

2024-10-18

Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key

Summary

This paper discusses a new approach to improve the ability of large language models (LLMs) to generate long outputs by using high-quality data for training with minimal adjustments.

What's the problem?

As LLMs evolve, they can support longer pieces of text, but many struggle to produce lengthy outputs effectively. This is often due to a lack of training data that includes examples of long outputs, which limits the model's ability to learn how to generate them properly. Without sufficient long-output data during training, these models may not perform well when asked to create longer responses.

What's the solution?

To address this, the authors explore how using high-quality data can enhance the performance of LLMs for generating long outputs. They focus on tuning models that have already been aligned with human instructions or conversations. By carefully selecting and curating training data, they demonstrate that it's possible to achieve significant improvements in the model's ability to generate long outputs using only a small amount of new training data. They tested their approach across multiple models and found consistent improvements in performance.

Why it matters?

This research is important because it shows that enhancing LLMs for long output generation doesn't always require massive amounts of new data or extensive retraining. By focusing on the quality of the training data, developers can make LLMs more effective and efficient, which is crucial for applications like storytelling, report generation, and any task that requires detailed and lengthy responses.

Abstract

As large language models rapidly evolve to support longer context, there is a notable disparity in their capability to generate output at greater lengths. Recent study suggests that the primary cause for this imbalance may arise from the lack of data with long-output during alignment training. In light of this observation, attempts are made to re-align foundation models with data that fills the gap, which result in models capable of generating lengthy output when instructed. In this paper, we explore the impact of data-quality in tuning a model for long output, and the possibility of doing so from the starting points of human-aligned (instruct or chat) models. With careful data curation, we show that it possible to achieve similar performance improvement in our tuned models, with only a small fraction of training data instances and compute. In addition, we assess the generalizability of such approaches by applying our tuning-recipes to several models. our findings suggest that, while capacities for generating long output vary across different models out-of-the-box, our approach to tune them with high-quality data using lite compute, consistently yields notable improvement across all models we experimented on. We have made public our curated dataset for tuning long-writing capability, the implementations of model tuning and evaluation, as well as the fine-tuned models, all of which can be openly-accessed.

View Paper