Instruction Following without Instruction Tuning

John Hewitt, Nelson F. Liu, Percy Liang, Christopher D. Manning

2024-09-27

Instruction Following without Instruction Tuning

Summary

This paper talks about a new way to help language models follow instructions without needing extensive training on instruction-response pairs, which is called implicit instruction tuning. It shows that models can still learn to follow instructions effectively even when they are only trained on responses.

What's the problem?

Traditionally, training language models to follow instructions involves a process called instruction tuning, where the model learns from specific pairs of instructions and their corresponding responses. However, this method can be resource-intensive and may not always be necessary. The challenge is to find ways to enable models to follow instructions without relying heavily on these paired examples.

What's the solution?

The researchers discovered that language models can still learn to follow instructions by training them solely on responses, without any direct instructions. They found that even when the model is trained on a narrow set of data (like poetry), it can still generate responses for different tasks (like recipe generation) effectively. They also explored simple adjustments to the model's behavior that can help it better understand and follow instructions, such as changing how likely it is to repeat itself or end a response.

Why it matters?

This research is important because it simplifies the process of making language models more adaptable and efficient. By demonstrating that implicit instruction tuning can yield effective instruction-following behavior, the findings suggest that we can save time and resources in training these models while still achieving good performance. This could lead to more versatile AI systems that are easier to develop and deploy in various applications.

Abstract

Instruction tuning commonly means finetuning a language model on instruction-response pairs. We discover two forms of adaptation (tuning) that are deficient compared to instruction tuning, yet still yield instruction following; we call this implicit instruction tuning. We first find that instruction-response pairs are not necessary: training solely on responses, without any corresponding instructions, yields instruction following. This suggests pretrained models have an instruction-response mapping which is revealed by teaching the model the desired distribution of responses. However, we then find it's not necessary to teach the desired distribution of responses: instruction-response training on narrow-domain data like poetry still leads to broad instruction-following behavior like recipe generation. In particular, when instructions are very different from those in the narrow finetuning domain, models' responses do not adhere to the style of the finetuning domain. To begin to explain implicit instruction tuning, we hypothesize that very simple changes to a language model's distribution yield instruction following. We support this by hand-writing a rule-based language model which yields instruction following in a product-of-experts with a pretrained model. The rules are to slowly increase the probability of ending the sequence, penalize repetition, and uniformly change 15 words' probabilities. In summary, adaptations made without being designed to yield instruction following can do so implicitly.

View Paper