Estimating the Hallucination Rate of Generative AI
Andrew Jesson, Nicolas Beltran-Velez, Quentin Chu, Sweta Karlekar, Jannik Kossen, Yarin Gal, John P. Cunningham, David Blei
2024-06-14

Summary
This paper discusses a new method for estimating how often generative AI models, like large language models, produce incorrect or nonsensical outputs, which are referred to as 'hallucinations.' The focus is on a specific type of learning called in-context learning (ICL).
What's the problem?
Generative AI models can sometimes create responses that are not based on reality or the data they were trained on. These incorrect outputs, known as hallucinations, can be misleading and reduce the reliability of AI systems. Understanding how often these hallucinations occur is crucial for improving these models.
What's the solution?
The authors propose a method to estimate the hallucination rate by analyzing how likely it is for a model to generate a hallucination when given a dataset and a prediction question. They define a hallucination as an output that has low probability under the model's understanding of the data. Their approach involves generating queries and responses from the model and evaluating the likelihood of those responses being accurate. They tested their method using various tasks involving synthetic data and natural language.
Why it matters?
This research is important because it helps identify and quantify the problem of hallucinations in generative AI. By developing a way to estimate how often these errors occur, researchers and developers can work towards creating more reliable AI systems. This is especially critical as AI technologies become more integrated into everyday life and are used for important tasks like decision-making and information retrieval.
Abstract
This work is about estimating the hallucination rate for in-context learning (ICL) with Generative AI. In ICL, a conditional generative model (CGM) is prompted with a dataset and asked to make a prediction based on that dataset. The Bayesian interpretation of ICL assumes that the CGM is calculating a posterior predictive distribution over an unknown Bayesian model of a latent parameter and data. With this perspective, we define a hallucination as a generated prediction that has low-probability under the true latent parameter. We develop a new method that takes an ICL problem -- that is, a CGM, a dataset, and a prediction question -- and estimates the probability that a CGM will generate a hallucination. Our method only requires generating queries and responses from the model and evaluating its response log probability. We empirically evaluate our method on synthetic regression and natural language ICL tasks using large language models.