SCALE: Selective Resource Allocation for Overcoming Performance Bottlenecks in Mathematical Test-time Scaling
Yang Xiao, Chunpu Xu, Ruifeng Yuan, Jiashuo Wang, Wenjie Li, Pengfei Liu
2025-12-02
Summary
This paper introduces a new method, called SCALE, for improving how large language models (LLMs) solve complex math problems. It focuses on making the models more efficient by smartly deciding where to spend their processing power during problem-solving.
What's the problem?
Currently, when LLMs are given more computing power to tackle hard math problems, that power is spread evenly across *all* steps of the problem. This isn't ideal because some steps are easy and don't need much effort, while others are really difficult and require a lot more. Essentially, valuable computing resources are wasted on simple parts, limiting performance on the truly challenging aspects of the problem, and you eventually hit a point where adding more power doesn't help much.
What's the solution?
SCALE is inspired by the idea that humans use two different thinking styles: a fast, automatic 'System 1' for easy tasks and a slower, more deliberate 'System 2' for hard ones. SCALE breaks down a math problem into smaller steps, figures out which steps are easy and which are hard, and then applies more computing power to the difficult steps while quickly handling the easy ones. It does this by first breaking the problem down, then assessing the difficulty of each part, assigning a processing mode (System 1 or System 2), and finally solving the problem step-by-step, carrying information from one step to the next.
Why it matters?
This research is important because it shows a way to get significantly better results from LLMs on math problems *without* necessarily needing to massively increase computing power. SCALE improves accuracy by up to 13.75% and reduces the amount of computation needed by 33-53%, making these powerful models more practical and efficient for solving complex mathematical tasks.
Abstract
Test-time compute scaling has emerged as a powerful paradigm for enhancing mathematical reasoning in large language models (LLMs) by allocating additional computational resources during inference. However, current methods employ uniform resource distribution across all reasoning sub-problems, creating fundamental bottlenecks where challenging sub-problems receive insufficient attention while routine operations consume disproportionate resources. This uniform allocation creates performance bottlenecks where additional computational resources yield diminishing returns. Inspired by dual-process theory, we propose SCALE (Selective Resource Allocation), a framework that selectively allocates computational resources based on sub-problem difficulty. SCALE operates through four stages: (1) problem decomposition into sequential reasoning sub-problems, (2) difficulty assessment of each sub-problem to distinguish between routine operations and computationally challenging sub-problems, (3) selective processing mode assignment between System 1 for simple sub-problems and System 2 for complex ones, and (4) sequential execution with context propagation. By concentrating resources on challenging sub-problems while processing routine operations efficiently, SCALE achieves substantial performance improvements with superior resource utilization. Extensive experiments demonstrate that SCALE significantly outperforms uniform scaling baselines, achieving accuracy improvements of up to 13.75 percentage points (57.50% to 71.25% on AIME25) while reducing computational costs by 33%-53%, representing a major advance in test-time scaling that addresses fundamental limitations of current approaches.