HALoGEN: Fantastic LLM Hallucinations and Where to Find Them
Abhilasha Ravichander, Shrusti Ghela, David Wadden, Yejin Choi
2025-01-15
Summary
This paper talks about HALoGEN, a new tool for finding and measuring 'hallucinations' in AI language models. Hallucinations are when AI makes up false information that doesn't match real-world facts or the information it was given.
What's the problem?
AI language models are really good at writing text that sounds natural, but they sometimes make up false information. It's hard to catch these mistakes because having humans check everything the AI says is slow and expensive. We need a better way to find and understand these AI hallucinations.
What's the solution?
The researchers created HALoGEN, which is like a big test for AI. It has over 10,000 questions covering different topics like programming and science. HALoGEN also includes smart checkers that can automatically break down what the AI says and check each part against reliable information. They used HALoGEN to test 14 different AI models, looking at about 150,000 AI-generated responses. They also came up with a new way to classify different types of AI mistakes based on where the errors might come from.
Why it matters?
This matters because as AI becomes more common in our lives, we need to be able to trust what it tells us. HALoGEN gives researchers a powerful tool to study why AI makes things up and how often it happens. This could help make AI more reliable and trustworthy in the future. It's especially important for areas where accuracy is crucial, like in science or when AI is used to help make important decisions.
Abstract
Despite their impressive ability to generate high-quality and fluent text, generative large language models (LLMs) also produce hallucinations: statements that are misaligned with established world knowledge or provided input context. However, measuring hallucination can be challenging, as having humans verify model generations on-the-fly is both expensive and time-consuming. In this work, we release HALoGEN, a comprehensive hallucination benchmark consisting of: (1) 10,923 prompts for generative models spanning nine domains including programming, scientific attribution, and summarization, and (2) automatic high-precision verifiers for each use case that decompose LLM generations into atomic units, and verify each unit against a high-quality knowledge source. We use this framework to evaluate ~150,000 generations from 14 language models, finding that even the best-performing models are riddled with hallucinations (sometimes up to 86% of generated atomic facts depending on the domain). We further define a novel error classification for LLM hallucinations based on whether they likely stem from incorrect recollection of training data (Type A errors), or incorrect knowledge in training data (Type B errors), or are fabrication (Type C errors). We hope our framework provides a foundation to enable the principled study of why generative models hallucinate, and advances the development of trustworthy large language models.