Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Kaiwen Wang, Rahul Kidambi, Ryan Sullivan, Alekh Agarwal, Christoph Dann, Andrea Michi, Marco Gelmi, Yunxuan Li, Raghav Gupta, Avinava Dubey, Alexandre Ramé, Johan Ferret, Geoffrey Cideron, Le Hou, Hongkun Yu, Amr Ahmed, Aranyak Mehta, Léonard Hussenot, Olivier Bachem, Edouard Leurent
2024-07-23

Summary
This paper introduces Conditioned Language Policy (CLP), a new framework designed to help language models (LMs) balance multiple objectives during training. It focuses on improving how these models can be fine-tuned to meet different goals, such as creativity and safety, without needing to create separate models for each objective.
What's the problem?
Language models often need to perform well in various areas, like generating creative content or ensuring safety in responses. However, balancing these conflicting goals can be challenging. Current methods either focus on a single goal or require multiple models, making it hard to achieve the right balance efficiently. This limits the flexibility and effectiveness of LMs in real-world applications.
What's the solution?
To solve this problem, the authors developed CLP, which allows a single model to learn how to handle multiple objectives at once. Instead of training different models for each goal, CLP uses techniques from multi-task training and efficient parameter tuning. This means that the model can adapt its behavior based on the specific requirements of each task while maintaining high performance across all objectives. The authors conducted experiments showing that CLP outperforms existing methods and provides better control over the trade-offs between different goals.
Why it matters?
This research is important because it enhances the capabilities of language models, making them more versatile and effective in various applications. By allowing a single model to manage multiple objectives, CLP can improve how AI interacts with users, ensuring that it is both creative and safe. This could lead to better AI systems in customer service, content creation, and other fields where balancing different needs is crucial.
Abstract
Reward-based finetuning is crucial for aligning language policies with intended behaviors (e.g., creativity and safety). A key challenge here is to develop steerable language models that trade-off multiple (conflicting) objectives in a flexible and efficient manner. This paper presents Conditioned Language Policy (CLP), a general framework for finetuning language models on multiple objectives. Building on techniques from multi-task training and parameter-efficient finetuning, CLP can learn steerable models that effectively trade-off conflicting objectives at inference time. Notably, this does not require training or maintaining multiple models to achieve different trade-offs between the objectives. Through an extensive set of experiments and ablations, we show that the CLP framework learns steerable models that outperform and Pareto-dominate the current state-of-the-art approaches for multi-objective finetuning.