ROI-Reasoning: Rational Optimization for Inference via Pre-Computation Meta-Cognition
Muyang Zhao, Qi Qi, Hao Sun
2026-01-08
Summary
This paper explores how to make large language models, or LLMs, more efficient when solving multiple problems with limited resources, specifically a limited number of 'thinking steps' or tokens they can use.
What's the problem?
LLMs are powerful, but they don't naturally know how much effort a problem needs to be solved. They can waste resources on easy tasks or fail on hard ones when there's a strict limit on how much 'thinking' they can do overall. Imagine having a limited number of calculations you can perform on a set of math problems – you need to decide which problems are worth spending those calculations on and how many to allocate to each.
What's the solution?
The researchers developed a system called ROI-Reasoning. It works in two steps: first, they 'train' the LLM to predict how difficult a problem is and how much it would benefit from more thinking. This allows the model to decide whether to even *try* solving a problem. Second, they use a technique called reinforcement learning to teach the model to strategically allocate its limited 'thinking steps' across multiple problems to get the best overall result. It's like learning to budget your time effectively when studying for multiple exams.
Why it matters?
This research is important because it makes LLMs more practical for real-world applications where resources are always limited. By making LLMs more efficient and strategic in their reasoning, we can get better performance from the same amount of computing power, which is crucial for wider accessibility and deployment of these powerful AI tools.
Abstract
Large language models (LLMs) can achieve strong reasoning performance with sufficient computation, but they do not inherently know how much computation a task requires. We study budgeted inference-time reasoning for multiple tasks under a strict global token constraint and formalize it as a Ordered Stochastic Multiple-Choice Knapsack Problem(OS-MCKP). This perspective highlights a meta-cognitive requirement -- anticipating task difficulty, estimating return over investment (ROI), and allocating computation strategically. We propose ROI-Reasoning, a two-stage framework that endows LLMs with intrinsic, budget-aware rationality. In the first stage, Meta-Cognitive Fine-Tuning teaches models to predict reasoning cost and expected utility before generation, enabling explicit solve-or-skip decisions. Next, Rationality-Aware Reinforcement Learning optimizes sequential decision making under a hard token budget, allowing models to learn long-horizon allocation strategies. Across budgeted mathematical reasoning benchmarks, ROI-Reasoning consistently improves overall score while substantially reducing regret under tight computation budgets.