Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

Jeonghye Kim, Xufang Luo, Minbeom Kim, Sangmook Lee, Dohyung Kim, Jiwon Jeon, Dongsheng Li, Yuqing Yang

2026-03-26

Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

Summary

This paper investigates a technique called self-distillation, which is used to improve large language models (LLMs), but finds it surprisingly *hurts* their ability to solve math problems. It turns out the models become too confident and stop showing their 'thinking process' which is actually important for tackling new and difficult problems.

What's the problem?

Self-distillation usually makes LLMs better and more concise when explaining their reasoning. However, when applied to mathematical reasoning, it makes the models give shorter answers that are actually *less* accurate. The core issue is that self-distillation seems to discourage the model from expressing uncertainty – basically, admitting when it's not sure about a step in the problem-solving process.

What's the solution?

Researchers ran experiments where they gave the 'teacher' model (the one used for self-distillation) different amounts of information. They found that giving the teacher lots of information made it very good at solving problems it had already seen, but worse at solving *new* problems. This is because the extra information suppressed the model’s tendency to express uncertainty. They tested this on several different LLMs – Qwen3-8B, DeepSeek-Distill-Qwen-7B, and Olmo3-7B-Instruct – and saw performance drops of up to 40%.

Why it matters?

This research shows that it’s not enough to just get an LLM to give the right answer; it’s also important for it to show *how* it’s thinking, and to be able to express when it’s unsure. Allowing a model to verbalize its uncertainty is crucial for making it robust and able to handle problems it hasn’t encountered before, and simply reinforcing correct answers isn't enough to achieve that.

Abstract

Self-distillation has emerged as an effective post-training paradigm for LLMs, often improving performance while shortening reasoning traces. However, in mathematical reasoning, we find that it can reduce response length while degrading performance. We trace this degradation to the suppression of epistemic verbalization - the model's expression of uncertainty during reasoning. Through controlled experiments varying conditioning context richness and task coverage, we show that conditioning the teacher on rich information suppresses uncertainty expression, enabling rapid in-domain optimization with limited task coverage but harming OOD performance, where unseen problems benefit from expressing uncertainty and adjusting accordingly. Across Qwen3-8B, DeepSeek-Distill-Qwen-7B, and Olmo3-7B-Instruct, we observe performance drops of up to 40%. Our findings highlight that exposing appropriate levels of uncertainty is crucial for robust reasoning and underscore the importance of optimizing reasoning behavior beyond merely reinforcing correct answer traces.

View Paper