Response Tuning: Aligning Large Language Models without Instruction

Seokhyun An, Hyounghun Kim

2024-10-10

Response Tuning: Aligning Large Language Models without Instruction

Summary

This paper discusses Response Tuning (RT), a new method for improving large language models (LLMs) by focusing on refining their responses without needing specific instructions.

What's the problem?

Typically, LLMs are fine-tuned using instruction-response pairs, which requires a lot of time and resources to collect high-quality data. This process can be inefficient and may not fully utilize the capabilities of pre-trained models, making it challenging to create effective chat assistants.

What's the solution?

Response Tuning simplifies this by eliminating the need for instruction conditioning and instead focuses on supervising the response space. The researchers found that training models solely on responses allows them to effectively handle a wide range of instructions. They also discovered that adjusting the distribution of training responses can improve how well the model aligns with user preferences and can help it refuse unsafe queries.

Why it matters?

This research is important because it shows that LLMs can be improved without extensive instruction data, making the training process more efficient. By focusing on refining responses, this method could lead to better-performing chat assistants that are more helpful and safe for users.

Abstract

Instruction tuning-supervised fine-tuning using instruction-response pairs-is a foundational step in transitioning pre-trained Large Language Models (LLMs) into helpful and safe chat assistants. Our hypothesis is that establishing an adequate output space can enable such a transition given the capabilities inherent in pre-trained LLMs. To verify this, we propose Response Tuning (RT), which eliminates the instruction-conditioning step in instruction tuning and solely focuses on response space supervision. Our experiments demonstrate that RT models, trained only using responses, can effectively respond to a wide range of instructions and exhibit helpfulness comparable to that of their instruction-tuned counterparts. Furthermore, we observe that controlling the training response distribution can significantly improve their user preference or elicit target behaviors such as refusing assistance for unsafe queries. Our findings illuminate the role of establishing an adequate output space in alignment, highlighting the potential of the extensive inherent capabilities of pre-trained LLMs.

View Paper