FaithLens: Detecting and Explaining Faithfulness Hallucination

Shuzheng Si, Qingyi Wang, Haozhe Zhao, Yuzhuo Bai, Guanqiao Chen, Kangyang Luo, Gang Chen, Fanchao Qi, Minjia Zhang, Baobao Chang, Maosong Sun

2025-12-24

FaithLens: Detecting and Explaining Faithfulness Hallucination

Summary

This paper introduces FaithLens, a new model designed to identify when large language models (LLMs) are 'hallucinating' – essentially, making things up or stating things that aren't supported by the source material. It not only detects these inaccuracies but also explains *why* it thinks something is incorrect, aiming to build more trust in LLM outputs.

What's the problem?

Large language models are powerful, but they sometimes confidently present information that isn't true or isn't based on the information they were given. This is a big problem because if you're using an LLM for things like summarizing articles or answering questions based on documents, you need to be sure the information is reliable. Simply knowing *that* an LLM is wrong isn't enough; you need to understand *why* to fix the issue or avoid using the output.

What's the solution?

The researchers created FaithLens by first generating a large dataset of examples where LLM outputs are either faithful (accurate) or contain hallucinations, and importantly, they included explanations for each case. They used other advanced LLMs to help create this data and carefully checked its quality. Then, they 'trained' FaithLens on this data, starting with a basic understanding and then refining it using a system that rewards both correct identification of hallucinations and the quality of the explanations it provides.

Why it matters?

FaithLens is important because it outperforms even very powerful models like GPT-4 in detecting hallucinations, and it does so efficiently. The ability to not only identify inaccuracies but also *explain* them is a significant step towards making LLMs more trustworthy and useful in real-world applications where accuracy is critical, like research, journalism, or providing information to the public.

Abstract

Recognizing whether outputs from large language models (LLMs) contain faithfulness hallucination is crucial for real-world applications, e.g., retrieval-augmented generation and summarization. In this paper, we introduce FaithLens, a cost-efficient and effective faithfulness hallucination detection model that can jointly provide binary predictions and corresponding explanations to improve trustworthiness. To achieve this, we first synthesize training data with explanations via advanced LLMs and apply a well-defined data filtering strategy to ensure label correctness, explanation quality, and data diversity. Subsequently, we fine-tune the model on these well-curated training data as a cold start and further optimize it with rule-based reinforcement learning, using rewards for both prediction correctness and explanation quality. Results on 12 diverse tasks show that the 8B-parameter FaithLens outperforms advanced models such as GPT-4.1 and o3. Also, FaithLens can produce high-quality explanations, delivering a distinctive balance of trustworthiness, efficiency, and effectiveness.

View Paper