EpiCaR: Knowing What You Don't Know Matters for Better Reasoning in LLMs

Jewon Yeom, Jaewon Sok, Seonghyeon Park, Jeongjae Park, Taesup Kim

2026-01-14

EpiCaR: Knowing What You Don't Know Matters for Better Reasoning in LLMs

Summary

This paper focuses on improving how well large language models (LLMs) can reason and, importantly, how confident they are in their answers. Current methods for improving reasoning often make models *too* sure of themselves, even when they're wrong, which is a big problem.

What's the problem?

Large language models are getting better at tasks that require reasoning, but the way they're trained often leads to a lack of self-awareness. They become overconfident and can't accurately assess when their reasoning might be flawed. This happens because training focuses on reinforcing correct answers, but doesn't teach the model to recognize uncertainty. It's like a student who always studies the right answers but never learns to identify what they *don't* know.

What's the solution?

The researchers propose a new training method called EpiCaR, which stands for epistemically-calibrated reasoning. Instead of just focusing on getting the right answer, EpiCaR trains the model to also understand *when* its reasoning is trustworthy. It does this by having the model evaluate its own work and learn from those self-evaluations. They used this method with models like Llama-3 and Qwen-3, and it improved both accuracy and the model's ability to gauge its own confidence.

Why it matters?

This work is important because it addresses a key limitation of current LLMs: their tendency to be overly confident. By making models more aware of their own limitations, we can build more reliable and trustworthy AI systems. The researchers also showed that this approach can reduce the amount of computing power needed to get good results, making these models more efficient to use.

Abstract

Improving the reasoning abilities of large language models (LLMs) has largely relied on iterative self-training with model-generated data. While effective at boosting accuracy, existing approaches primarily reinforce successful reasoning paths, incurring a substantial calibration cost: models become overconfident and lose the ability to represent uncertainty. This failure has been characterized as a form of model collapse in alignment, where predictive distributions degenerate toward low-variance point estimates. We address this issue by reframing reasoning training as an epistemic learning problem, in which models must learn not only how to reason, but also when their reasoning should be trusted. We propose epistemically-calibrated reasoning (EpiCaR) as a training objective that jointly optimizes reasoning performance and calibration, and instantiate it within an iterative supervised fine-tuning framework using explicit self-evaluation signals. Experiments on Llama-3 and Qwen-3 families demonstrate that our approach achieves Pareto-superiority over standard baselines in both accuracy and calibration, particularly in models with sufficient reasoning capacity (e.g., 3B+). This framework generalizes effectively to OOD mathematical reasoning (GSM8K) and code generation (MBPP). Ultimately, our approach enables a 3X reduction in inference compute, matching the K=30 performance of STaR with only K=10 samples in capable models.

View Paper