Adaptive Decoding via Latent Preference Optimization

Shehzaad Dhuliawala, Ilia Kulikov, Ping Yu, Asli Celikyilmaz, Jason Weston, Sainbayar Sukhbaatar, Jack Lanchantin

2024-11-19

Adaptive Decoding via Latent Preference Optimization

Summary

This paper introduces Adaptive Decoding, a new method for language models that dynamically adjusts the sampling temperature during text generation to improve performance on different tasks.

What's the problem?

Language models typically use a fixed temperature for sampling when generating text. A higher temperature can lead to more creative and varied responses, while a lower temperature results in more accurate and factual outputs. However, using a single fixed temperature for all tasks can limit the model's ability to adapt to different needs, such as creativity versus accuracy.

What's the solution?

The authors propose a new layer called Adaptive Decoding that allows the model to select the sampling temperature dynamically based on the specific task or even individual tokens. They developed a technique called Latent Preference Optimization (LPO) to train this layer, which helps the model learn how to choose the best temperature for generating responses. This adaptive approach outperformed traditional fixed-temperature methods across various tasks, including creative writing and problem-solving.

Why it matters?

This research is important because it enhances how language models generate text by allowing them to be more flexible and responsive to different types of tasks. By optimizing the sampling process, Adaptive Decoding can lead to better-quality outputs, making AI tools more effective for users in creative and factual contexts.

Abstract

During language model decoding, it is known that using higher temperature sampling gives more creative responses, while lower temperatures are more factually accurate. However, such models are commonly applied to general instruction following, which involves both creative and fact seeking tasks, using a single fixed temperature across all examples and tokens. In this work, we introduce Adaptive Decoding, a layer added to the model to select the sampling temperature dynamically at inference time, at either the token or example level, in order to optimize performance. To learn its parameters we introduce Latent Preference Optimization (LPO) a general approach to train discrete latent variables such as choices of temperature. Our method outperforms all fixed decoding temperatures across a range of tasks that require different temperatures, including UltraFeedback, Creative Story Writing, and GSM8K.

View Paper