Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning
Jiaxi Bi, Tongxu Luo, Wenyu Du, Zhengyang Tang, Benyou Wang
2026-04-20
Summary
This paper focuses on making large reasoning models, which are powerful AI systems, more efficient. These models often explore many possible solutions, but a lot of that work is wasted when they make mistakes early on.
What's the problem?
Large reasoning models try out many different paths to arrive at an answer, but they can get stuck exploring incorrect options if they make a mistake at the beginning. This wastes a lot of computing power and time. While researchers have tried to cut off these unproductive paths, there hasn't been a clear way to categorize and understand the different approaches to doing so, making it hard to improve them.
What's the solution?
The researchers created a system called STOP (Super TOken for Pruning) to intelligently prune, or cut off, unproductive paths in large reasoning models. They first organized existing path pruning methods into a framework based on where the information to decide what to cut comes from (internal to the model or external) and whether the method learns to prune or uses a fixed rule. They found that methods that *learn* to prune using information *from within* the model were particularly promising, and STOP is designed around this idea. They tested STOP on models of varying sizes and showed it works better and faster than other methods.
Why it matters?
This work is important because it makes large reasoning models more practical and affordable to use. By reducing wasted computation, these models can be applied to more complex problems and become more accessible. The researchers also provide guidelines for how to best use their method in real-world situations, and they’ve made their code and models publicly available so others can build on their work.
Abstract
Parallel reasoning enhances Large Reasoning Models (LRMs) but incurs prohibitive costs due to futile paths caused by early errors. To mitigate this, path pruning at the prefix level is essential, yet existing research remains fragmented without a standardized framework. In this work, we propose the first systematic taxonomy of path pruning, categorizing methods by their signal source (internal vs. external) and learnability (learnable vs. non-learnable). This classification reveals the unexplored potential of learnable internal methods, motivating our proposal of STOP (Super TOken for Pruning). Extensive evaluations across LRMs ranging from 1.5B to 20B parameters demonstrate that STOP achieves superior effectiveness and efficiency compared to existing baselines. Furthermore, we rigorously validate the scalability of STOP under varying compute budgets - for instance, boosting GPT-OSS-20B accuracy on AIME25 from 84% to nearly 90% under fixed compute budgets. Finally, we distill our findings into formalized empirical guidelines to facilitate optimal real-world deployment. Code, data and models are available at https://bijiaxihh.github.io/STOP