HalluGuard: Evidence-Grounded Small Reasoning Models to Mitigate Hallucinations in Retrieval-Augmented Generation

Loris Bergeron, Ioana Buhnila, Jérôme François, Radu State

2025-10-08

HalluGuard: Evidence-Grounded Small Reasoning Models to Mitigate Hallucinations in Retrieval-Augmented Generation

Summary

This paper introduces HalluGuard, a smaller AI model designed to improve the reliability of larger language models by reducing instances where they 'hallucinate' or make up information.

What's the problem?

Large language models are really good at things like writing and answering questions, but they sometimes confidently state things that aren't true. This is a big problem because if you can't trust the information an AI gives you, it's hard to use it in important real-world situations. Specifically, when these models use information retrieved from other sources (a process called Retrieval-Augmented Generation or RAG), they can still generate incorrect statements not supported by the source material.

What's the solution?

The researchers created HalluGuard, a relatively small 4 billion parameter model, to act as a 'fact-checker' for RAG systems. It works by looking at a document and a claim made by the larger language model and deciding if the claim is actually supported by the document. HalluGuard was trained using a specially created dataset of both truthful and fabricated claims, and a training method that focuses on getting the model to prefer correct reasoning. This allows it to perform almost as well as much larger, more complex fact-checking models.

Why it matters?

HalluGuard is important because it shows you can significantly reduce hallucinations in large language models without needing a huge and expensive AI. By using a smaller, more efficient model, it makes trustworthy AI more accessible and practical for a wider range of applications. It also provides a way to understand *why* a model thinks something is true or false, increasing transparency.

Abstract

Large Language Models (LLMs) excel in many NLP tasks but remain prone to hallucinations, limiting trust in real-world applications. We present HalluGuard, a 4B-parameter Small Reasoning Model (SRM) for mitigating hallucinations in Retrieval-Augmented Generation (RAG). HalluGuard classifies document-claim pairs as grounded or hallucinated and produces evidence-grounded justifications for transparency. Our approach combines (i) a domain-agnostic synthetic dataset derived from FineWeb and refined through multi-stage curation and data reformation, (ii) synthetic grounded and hallucinated claims, and (iii) preference-based fine-tuning with Odds Ratio Preference Optimization to distill large-model reasoning into a smaller backbone. On the RAGTruth subset of the LLM-AggreFact benchmark, HalluGuard achieves 84.0% balanced accuracy (BAcc), rivaling specialized models, MiniCheck (7B; 84.0%) and Granite Guardian 3.3 (8B; 82.2%) while using roughly half their parameters. Over the full benchmark it reaches 75.7% BAcc, matching larger general-purpose LLMs such as GPT-4o (75.9%). We will release HalluGuard and datasets under Apache 2.0 upon acceptance.

View Paper