Are LLM Decisions Faithful to Verbal Confidence?
Jiawei Wang, Yanfei Zhou, Siddartha Devic, Deqing Fu
2026-01-13
Summary
This paper investigates whether large language models (LLMs) actually *understand* their own uncertainty, or if they just *appear* to. It questions if the confidence an LLM expresses actually influences its decisions, especially when there's a risk of being wrong.
What's the problem?
LLMs can give a sense of how confident they are in their answers, but it's unclear if that confidence is meaningful. The core issue is that just because a model *says* it's unsure doesn't mean it will actually avoid answering when a wrong answer could be costly. The researchers wanted to know if LLMs can adjust their behavior – like choosing to not answer – based on how bad it would be to give a wrong answer.
What's the solution?
The researchers created a testing system called RiskEval. This system presented LLMs with questions and varied the 'penalty' for getting the answer wrong. Essentially, they made it more or less risky to answer. They then observed whether the models changed how often they chose to abstain from answering based on these different risk levels. They tested several of the most advanced LLMs currently available.
Why it matters?
The findings are concerning because the models didn't act rationally. Even when it was clearly better to abstain from answering to avoid a large penalty, they almost always tried to answer anyway, leading to poor results. This suggests that the confidence scores LLMs provide aren't reliable indicators of their true understanding of risk, and we can't necessarily trust them to make safe, well-considered decisions just because they *say* they're uncertain.
Abstract
Large Language Models (LLMs) can produce surprisingly sophisticated estimates of their own uncertainty. However, it remains unclear to what extent this expressed confidence is tied to the reasoning, knowledge, or decision making of the model. To test this, we introduce RiskEval: a framework designed to evaluate whether models adjust their abstention policies in response to varying error penalties. Our evaluation of several frontier models reveals a critical dissociation: models are neither cost-aware when articulating their verbal confidence, nor strategically responsive when deciding whether to engage or abstain under high-penalty conditions. Even when extreme penalties render frequent abstention the mathematically optimal strategy, models almost never abstain, resulting in utility collapse. This indicates that calibrated verbal confidence scores may not be sufficient to create trustworthy and interpretable AI systems, as current models lack the strategic agency to convert uncertainty signals into optimal and risk-sensitive decisions.