CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration?
Qing Zong, Jiayu Liu, Tianshi Zheng, Chunyang Li, Baixuan Xu, Haochen Shi, Weiqi Wang, Zhaowei Wang, Chunkit Chan, Yangqiu Song
2025-11-10
Summary
This paper investigates how to make large language models (LLMs) better at expressing how confident they are in their answers, which is really important for using them in situations where mistakes could have serious consequences.
What's the problem?
LLMs often aren't very good at accurately showing how sure they are about their responses. Simply trying to get them to mimic how humans express confidence doesn't work well because it doesn't address *why* they might be confident or uncertain. Getting perfect 'confidence scores' to train them on is also really hard, as it often requires multiple attempts and expert judgment.
What's the solution?
The researchers propose using 'natural language critiques' – essentially having the LLM analyze its own reasoning and identify potential weaknesses. They explored two main approaches: having the LLM critique its own confidence (Self-Critique) and a new training method called CritiCal, which uses these critiques to directly improve how well the LLM's stated confidence matches its actual accuracy. They found that CritiCal, which focuses on learning from critiques, works much better than just having the model self-assess, and even outperforms more powerful models like GPT-4o on challenging tasks. They also discovered that critiquing *confidence* is best for multiple-choice questions, while critiquing *uncertainty* is better for open-ended questions.
Why it matters?
This work is important because it moves us closer to building LLMs that are more reliable and trustworthy. If a model can accurately tell us when it's unsure, we can better understand its limitations and avoid relying on it in situations where it might make mistakes. The CritiCal method also shows promise for making LLMs more adaptable to new and unfamiliar situations, which is crucial for real-world applications.
Abstract
Accurate confidence calibration in Large Language Models (LLMs) is critical for safe use in high-stakes domains, where clear verbalized confidence enhances user trust. Traditional methods that mimic reference confidence expressions often fail to capture the reasoning needed for accurate confidence assessment. We propose natural language critiques as a solution, ideally suited for confidence calibration, as precise gold confidence labels are hard to obtain and often require multiple generations. This paper studies how natural language critiques can enhance verbalized confidence, addressing: (1) What to critique: uncertainty (question-focused) or confidence (answer-specific)? Analysis shows confidence suits multiple-choice tasks, while uncertainty excels in open-ended scenarios. (2) How to critique: self-critique or critique calibration training? We propose Self-Critique, enabling LLMs to critique and optimize their confidence beyond mere accuracy, and CritiCal, a novel Critique Calibration training method that leverages natural language critiques to improve confidence calibration, moving beyond direct numerical optimization. Experiments show that CritiCal significantly outperforms Self-Critique and other competitive baselines, even surpassing its teacher model, GPT-4o, in complex reasoning tasks. CritiCal also shows robust generalization in out-of-distribution settings, advancing LLM's reliability.