QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation
Dehai Min, Kailin Zhang, Tongtong Wu, Lu Cheng
2025-12-23
Summary
This paper introduces a new method, QuCo-RAG, to improve the accuracy of large language models (LLMs) by reducing instances where they 'hallucinate' or make up information. It focuses on making LLMs better at knowing *when* to look up information instead of relying on their potentially flawed internal confidence levels.
What's the problem?
Large language models are prone to hallucinations, meaning they sometimes confidently state incorrect information. Current methods for deciding when to retrieve external information to help the model avoid these errors rely on the model's *own* assessment of how sure it is about its answer. However, LLMs are often poorly calibrated – they can be very confident even when they are wrong, making these internal signals unreliable.
What's the solution?
QuCo-RAG tackles this by shifting away from the model’s self-assessment and instead uses objective data from the massive amount of text the model was originally trained on. It works in two steps: first, it identifies rare or unusual terms in the question that suggest the model might lack sufficient knowledge. Second, during the answer generation, it checks if those terms commonly appear together in the original training data. If they don’t, it signals a higher risk of hallucination and triggers the model to retrieve relevant information from an external source. This process is made fast using a technology called Infini-gram.
Why it matters?
This research is important because it provides a more reliable way to improve the accuracy of LLMs without needing to change the model itself. It’s a ‘model-agnostic’ approach, meaning it works with different LLMs, even those where the training data isn’t publicly known. By grounding the model’s responses in verifiable data from its training corpus, QuCo-RAG significantly reduces hallucinations and improves performance on complex question-answering tasks, even in specialized fields like biomedicine.
Abstract
Dynamic Retrieval-Augmented Generation adaptively determines when to retrieve during generation to mitigate hallucinations in large language models (LLMs). However, existing methods rely on model-internal signals (e.g., logits, entropy), which are fundamentally unreliable because LLMs are typically ill-calibrated and often exhibit high confidence in erroneous outputs. We propose QuCo-RAG, which shifts from subjective confidence to objective statistics computed from pre-training data. Our method quantifies uncertainty through two stages: (1) before generation, we identify low-frequency entities indicating long-tail knowledge gaps; (2) during generation, we verify entity co-occurrence in the pre-training corpus, where zero co-occurrence often signals hallucination risk. Both stages leverage Infini-gram for millisecond-latency queries over 4 trillion tokens, triggering retrieval when uncertainty is high. Experiments on multi-hop QA benchmarks show QuCo-RAG achieves EM gains of 5--12 points over state-of-the-art baselines with OLMo-2 models, and transfers effectively to models with undisclosed pre-training data (Llama, Qwen, GPT), improving EM by up to 14 points. Domain generalization on biomedical QA further validates the robustness of our paradigm. These results establish corpus-grounded verification as a principled, practically model-agnostic paradigm for dynamic RAG. Our code is publicly available at https://github.com/ZhishanQ/QuCo-RAG.