Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?

Chenrui Fan, Ming Li, Lichao Sun, Tianyi Zhou

2025-04-10

Missing Premise exacerbates Overthinking: Are Reasoning Models losing
Critical Thinking Skill?

Summary

This paper talks about how AI models that ‘think step-by-step’ give long, confusing answers to incomplete questions, wasting time and energy instead of spotting flaws in the questions.

What's the problem?

When questions are missing key details, reasoning-focused AI models write way too much, like over-explaining a math problem that can’t be solved, instead of quickly pointing out the missing info.

What's the solution?

The study shows that simpler AI models handle these bad questions better by giving shorter answers, suggesting that current training methods for reasoning AI need to teach ‘critical thinking’ to avoid wasted effort.

Why it matters?

This helps make AI smarter and more efficient, saving energy and improving tools like homework helpers or customer service bots that need to spot unclear questions fast.

Abstract

We find that the response length of reasoning LLMs, whether trained by reinforcement learning or supervised learning, drastically increases for ill-posed questions with missing premises (MiP), ending up with redundant and ineffective thinking. This newly introduced scenario exacerbates the general overthinking issue to a large extent, which we name as the MiP-Overthinking. Such failures are against the ``test-time scaling law'' but have been widely observed on multiple datasets we curated with MiP, indicating the harm of cheap overthinking and a lack of critical thinking. Surprisingly, LLMs not specifically trained for reasoning exhibit much better performance on the MiP scenario, producing much shorter responses that quickly identify ill-posed queries. This implies a critical flaw of the current training recipe for reasoning LLMs, which does not encourage efficient thinking adequately, leading to the abuse of thinking patterns. To further investigate the reasons behind such failures, we conduct fine-grained analyses of the reasoning length, overthinking patterns, and location of critical thinking on different types of LLMs. Moreover, our extended ablation study reveals that the overthinking is contagious through the distillation of reasoning models' responses. These results improve the understanding of overthinking and shed novel insights into mitigating the problem.

View Paper