Multi-property Steering of Large Language Models with Dynamic Activation Composition

Daniel Scalena, Gabriele Sarti, Malvina Nissim

2024-06-26

Multi-property Steering of Large Language Models with Dynamic Activation Composition

Summary

This paper discusses a new method called Dynamic Activation Composition, which improves how large language models (LLMs) generate text by allowing them to control multiple properties at once. This method helps ensure that the generated text is both accurate and fluent.

What's the problem?

Previous methods for steering LLMs usually focused on just one property at a time, like tone or style, and tested these methods in artificial settings. This limited approach does not capture the complexities of real-world applications where multiple properties need to be controlled simultaneously. As a result, there was a lack of understanding about how to effectively manage these different aspects during text generation.

What's the solution?

The authors conducted a detailed evaluation of various activation steering strategies, which are techniques used to guide how LLMs generate text. They introduced Dynamic Activation Composition, which allows for adjusting the influence of multiple properties throughout the generation process. This approach ensures that the model can maintain high quality in its responses while also keeping the text sounding natural and fluent. Their experiments demonstrated that this method works well across different scenarios.

Why it matters?

This research is important because it enhances the capabilities of LLMs by enabling them to produce more nuanced and contextually appropriate responses. By allowing models to control multiple aspects of their output simultaneously, this method can improve applications in areas like customer service, content creation, and interactive storytelling, making AI interactions more effective and relatable.

Abstract

Activation steering methods were shown to be effective in conditioning language model generation by additively intervening over models' intermediate representations. However, the evaluation of these techniques has so far been limited to single conditioning properties and synthetic settings. In this work, we conduct a comprehensive evaluation of various activation steering strategies, highlighting the property-dependent nature of optimal parameters to ensure a robust effect throughout generation. To address this issue, we propose Dynamic Activation Composition, an information-theoretic approach to modulate the steering intensity of one or more properties throughout generation. Our experiments on multi-property steering show that our method successfully maintains high conditioning while minimizing the impact of conditioning on generation fluency.

View Paper