Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models

Lexiang Xiong, Qi Li, Jingwen Ye, Xinchao Wang

2026-03-17

Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models

Summary

This paper tackles the problem of Vision-Language Models (VLMs) making things up – essentially, confidently stating incorrect information. It introduces a new way to understand *why* these models hallucinate, moving beyond just identifying the wrong answers to figuring out what's happening inside the model's 'thought process' as it generates text.

What's the problem?

VLMs are powerful, but they often 'hallucinate,' meaning they generate responses that sound good but aren't actually true based on the input image or question. This is a huge problem because if we can't trust what these models say, it limits how much we can rely on them in real-world applications. Existing methods just flag the incorrect outputs, but don't explain *why* the model made the mistake.

What's the solution?

The researchers developed a framework that looks at how the model arrives at its answer, treating it like a step-by-step 'cognitive trajectory.' They use mathematical tools from information theory to map this process onto a simplified 'Cognitive State Space,' allowing them to spot unusual patterns. They found that when a model is about to hallucinate, its path through this space becomes geometrically abnormal and surprisingly predictable. This allows them to detect hallucinations by identifying these geometric anomalies. They also pinpointed three specific types of internal failures that lead to hallucinations: instability in how the model perceives the image, problems with logical reasoning, and uncertainty in making decisions.

Why it matters?

This work is important because it doesn't just detect hallucinations, it *diagnoses* them. By understanding the underlying causes of these errors – perceptual issues, reasoning failures, or decisional ambiguity – we can start to build more reliable and trustworthy AI systems. The framework is efficient, works even with imperfect data, and provides a way to trace errors back to specific problems within the model, ultimately making AI reasoning more transparent and auditable.

Abstract

Vision-Language Models (VLMs) frequently "hallucinate" - generate plausible yet factually incorrect statements - posing a critical barrier to their trustworthy deployment. In this work, we propose a new paradigm for diagnosing hallucinations, recasting them from static output errors into dynamic pathologies of a model's computational cognition. Our framework is grounded in a normative principle of computational rationality, allowing us to model a VLM's generation as a dynamic cognitive trajectory. We design a suite of information-theoretic probes that project this trajectory onto an interpretable, low-dimensional Cognitive State Space. Our central discovery is a governing principle we term the geometric-information duality: a cognitive trajectory's geometric abnormality within this space is fundamentally equivalent to its high information-theoretic surprisal. Hallucination detection is counts as a geometric anomaly detection problem. Evaluated across diverse settings - from rigorous binary QA (POPE) and comprehensive reasoning (MME) to unconstrained open-ended captioning (MS-COCO) - our framework achieves state-of-the-art performance. Crucially, it operates with high efficiency under weak supervision and remains highly robust even when calibration data is heavily contaminated. This approach enables a causal attribution of failures, mapping observable errors to distinct pathological states: perceptual instability (measured by Perceptual Entropy), logical-causal failure (measured by Inferential Conflict), and decisional ambiguity (measured by Decision Entropy). Ultimately, this opens a path toward building AI systems whose reasoning is transparent, auditable, and diagnosable by design.

View Paper