ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models

Yuzhe Gu, Ziwei Ji, Wenwei Zhang, Chengqi Lyu, Dahua Lin, Kai Chen

2024-07-09

ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models

Summary

This paper talks about ANAH-v2, a new system designed to improve how we identify and manage hallucinations in large language models (LLMs). Hallucinations refer to incorrect or misleading information that these models sometimes produce, especially when answering complex questions.

What's the problem?

The main problem is that LLMs often generate hallucinations, which are plausible-sounding but false statements. Current methods for detecting and fixing these hallucinations are limited in size and scope, making it hard to effectively address the issue. Additionally, existing annotation systems for identifying hallucinations can be unreliable and costly to maintain.

What's the solution?

To tackle this problem, the authors introduced an iterative self-training framework that helps expand the dataset used for annotating hallucinations while also improving the accuracy of the annotators. This framework uses a method called Expectation Maximization (EM) to repeatedly annotate new data and train better annotators with each cycle. The final annotator developed through this process is more effective than even advanced models like GPT-4, achieving state-of-the-art results in detecting hallucinations across various tests.

Why it matters?

This research is important because it enhances our ability to evaluate and mitigate hallucinations in LLMs, which is crucial for their reliability in real-world applications. By improving how we detect these issues, ANAH-v2 can help make AI systems more trustworthy and effective, particularly in tasks like question answering where accuracy is essential.

Abstract

Large language models (LLMs) exhibit hallucinations in long-form question-answering tasks across various domains and wide applications. Current hallucination detection and mitigation datasets are limited in domains and sizes, which struggle to scale due to prohibitive labor costs and insufficient reliability of existing hallucination annotators. To facilitate the scalable oversight of LLM hallucinations, this paper introduces an iterative self-training framework that simultaneously and progressively scales up the hallucination annotation dataset and improves the accuracy of the hallucination annotator. Based on the Expectation Maximization (EM) algorithm, in each iteration, the framework first applies a hallucination annotation pipeline to annotate a scaled dataset and then trains a more accurate hallucination annotator on the dataset. This new hallucination annotator is adopted in the hallucination annotation pipeline used for the next iteration. Extensive experimental results demonstrate that the finally obtained hallucination annotator with only 7B parameters surpasses the performance of GPT-4 and obtains new state-of-the-art hallucination detection results on HaluEval and HalluQA by zero-shot inference. Such an annotator can not only evaluate the hallucination levels of various LLMs on the large-scale dataset but also help to mitigate the hallucination of LLMs generations, with the Natural Language Inference (NLI) metric increasing from 25% to 37% on HaluEval.

View Paper