Efficient Reasoning with Balanced Thinking
Yulin Li, Tengyao Tu, Li Ding, Junjie Wang, Huiling Zhen, Yixin Chen, Yong Li, Zhuotao Tian
2026-03-19
Summary
This paper addresses a key issue with large reasoning models: they often either think too much (overthinking) or not enough (underthinking) when solving problems, leading to wasted resources and inaccurate answers.
What's the problem?
Large reasoning models, while powerful, aren't always efficient. Sometimes they get stuck in loops, doing extra calculations on easy problems, and other times they don't explore enough possibilities on harder ones. Current attempts to fix overthinking can accidentally make the model *underthink*, reducing its accuracy. Essentially, it's hard to get these models to consistently use the right amount of 'thinking' for each problem.
What's the solution?
The researchers developed a method called ReBalance that doesn't require any additional training of the model. It works by monitoring how confident the model is during its reasoning process. If the confidence jumps around a lot, it suggests overthinking, and ReBalance gently steers the model to simplify. If the model is consistently *too* confident, it suggests underthinking, and ReBalance encourages it to explore more options. They do this by analyzing how the model behaves on a small set of example problems and then using that information to guide its reasoning in real-time.
Why it matters?
This work is important because it offers a practical way to make large reasoning models more reliable and efficient. Because ReBalance doesn't need extra training, it can be easily applied to existing models, making them more useful in situations where computing power is limited or accuracy is critical, like in math, answering general questions, or even writing code.
Abstract
Large Reasoning Models (LRMs) have shown remarkable reasoning capabilities, yet they often suffer from overthinking, expending redundant computational steps on simple problems, or underthinking, failing to explore sufficient reasoning paths despite inherent capabilities. These issues lead to inefficiencies and potential inaccuracies, limiting practical deployment in resource-constrained settings. Existing methods to mitigate overthinking, such as suppressing reflective keywords or adjusting reasoning length, may inadvertently induce underthinking, compromising accuracy. Therefore, we propose ReBalance, a training-free framework that achieves efficient reasoning with balanced thinking. ReBalance leverages confidence as a continuous indicator of reasoning dynamics, identifying overthinking through high confidence variance and underthinking via consistent overconfidence. By aggregating hidden states from a small-scale dataset into reasoning mode prototypes, we compute a steering vector to guide LRMs' reasoning trajectories. A dynamic control function modulates this vector's strength and direction based on real-time confidence, pruning redundancy during overthinking, and promoting exploration during underthinking. Extensive experiments conducted on four models ranging from 0.5B to 32B, and across nine benchmarks in math reasoning, general question answering, and coding tasks demonstrate that ReBalance effectively reduces output redundancy while improving accuracy, offering a general, training-free, and plug-and-play strategy for efficient and robust LRM deployment. Code is available at https://github.com/yu-lin-li/ReBalance .