Deep Bayesian Active Learning for Preference Modeling in Large Language Models

Luckeciano C. Melo, Panagiotis Tigas, Alessandro Abate, Yarin Gal

2024-06-18

Deep Bayesian Active Learning for Preference Modeling in Large Language Models

Summary

This paper discusses a new method called Bayesian Active Learner for Preference Modeling (BAL-PM) that improves how large language models (LLMs) learn from human preferences. It focuses on selecting the best data points to get feedback from people, making the process more efficient and cost-effective.

What's the problem?

While using human feedback to guide LLMs has been successful, the process of selecting and labeling data can be slow and expensive, especially when dealing with large amounts of information. This means that many valuable insights might be missed because it’s hard to decide which data points are most useful for improving the model's performance.

What's the solution?

To tackle this issue, the authors developed BAL-PM, which is a smarter way to choose which data points to label based on human preferences. Instead of just picking random samples or relying on simple methods that can lead to repetitive choices, BAL-PM uses a more advanced approach that looks for points where the model is uncertain and also aims to cover a wide range of possible prompts. This helps ensure that the feedback collected is diverse and informative. Their experiments showed that this method needed significantly fewer labels—between 33% to 68% less—compared to previous methods while still achieving better results.

Why it matters?

This research is important because it enhances how LLMs can learn from human feedback, making the training process more efficient. By reducing the number of labels needed, BAL-PM not only saves time and resources but also helps improve the overall quality of LLMs. This advancement can lead to better AI systems that understand human preferences more accurately, benefiting applications in customer service, content creation, and many other fields.

Abstract

Leveraging human preferences for steering the behavior of Large Language Models (LLMs) has demonstrated notable success in recent years. Nonetheless, data selection and labeling are still a bottleneck for these systems, particularly at large scale. Hence, selecting the most informative points for acquiring human feedback may considerably reduce the cost of preference labeling and unleash the further development of LLMs. Bayesian Active Learning provides a principled framework for addressing this challenge and has demonstrated remarkable success in diverse settings. However, previous attempts to employ it for Preference Modeling did not meet such expectations. In this work, we identify that naive epistemic uncertainty estimation leads to the acquisition of redundant samples. We address this by proposing the Bayesian Active Learner for Preference Modeling (BAL-PM), a novel stochastic acquisition policy that not only targets points of high epistemic uncertainty according to the preference model but also seeks to maximize the entropy of the acquired prompt distribution in the feature space spanned by the employed LLM. Notably, our experiments demonstrate that BAL-PM requires 33% to 68% fewer preference labels in two popular human preference datasets and exceeds previous stochastic Bayesian acquisition policies.

View Paper