Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency
Haoming Xu, Ningyuan Zhao, Yunzhi Yao, Weihong Xu, Hongru Wang, Xinle Deng, Shumin Deng, Jeff Z. Pan, Huajun Chen, Ningyu Zhang
2026-01-12
Summary
This paper investigates how well large language models (LLMs) maintain accurate and consistent answers when faced with slight changes to the questions or surrounding information. It finds that even if an LLM seems confident in an answer, that answer can easily be thrown off by minor disruptions, and proposes ways to make LLMs more reliable.
What's the problem?
Currently, we evaluate LLMs by checking if they give the same answer multiple times – this is called 'self-consistency'. However, this method doesn't reveal if the LLM's understanding is truly solid. The problem is that LLMs can appear consistent while actually being very sensitive to even small changes in the way a question is asked or the context provided. This means they might give correct answers in ideal situations but fail when things get a little more realistic and messy.
What's the solution?
The researchers developed a new way to test LLM reliability called 'Neighbor-Consistency Belief' (NCB). Instead of just checking for the same answer, NCB looks at how consistent the LLM is across similar, related questions – essentially, a 'neighborhood' of concepts. They also created a 'stress-test' to intentionally disrupt the LLM with contextual changes. Finally, they introduced a training method called 'Structure-Aware Training' (SAT) that helps the LLM build a more stable and consistent understanding of information, making it less likely to be fooled by these disruptions.
Why it matters?
This work is important because as LLMs are used in more real-world applications, like providing information or making decisions, it's crucial that they are trustworthy. Simply being 'correct' sometimes isn't enough; they need to be consistently reliable even when faced with unexpected or slightly altered situations. Improving this reliability, as this paper aims to do, will make LLMs more useful and safe to deploy.
Abstract
As Large Language Models (LLMs) are increasingly deployed in real-world settings, correctness alone is insufficient. Reliable deployment requires maintaining truthful beliefs under contextual perturbations. Existing evaluations largely rely on point-wise confidence like Self-Consistency, which can mask brittle belief. We show that even facts answered with perfect self-consistency can rapidly collapse under mild contextual interference. To address this gap, we propose Neighbor-Consistency Belief (NCB), a structural measure of belief robustness that evaluates response coherence across a conceptual neighborhood. To validate the efficiency of NCB, we introduce a new cognitive stress-testing protocol that probes outputs stability under contextual interference. Experiments across multiple LLMs show that the performance of high-NCB data is relatively more resistant to interference. Finally, we present Structure-Aware Training (SAT), which optimizes context-invariant belief structure and reduces long-tail knowledge brittleness by approximately 30%. Code will be available at https://github.com/zjunlp/belief.