< Explain other AI papers

Steering LLM Thinking with Budget Guidance

Junyan Li, Wenshuo Zhao, Yang Zhang, Chuang Gan

2025-06-17

Steering LLM Thinking with Budget Guidance

Summary

This paper talks about Budget Guidance, a new method designed to help large language models (LLMs) manage how much they think or reason when answering questions, especially when there’s a limit on how long they can take. Instead of changing the model itself, Budget Guidance works during the model’s answering process to keep its thinking within a set budget, making it both faster and better at solving problems like those in math tests.

What's the problem?

The problem is that some large language models think for a long time to try to get the best answers, but this long reasoning is often costly and not always useful because it takes a lot of time and computing power. Controlling how long these models think without losing accuracy is difficult, especially when strict limits on thinking time are needed.

What's the solution?

The solution was to create a lightweight predictor that estimates how much thinking is left while the model generates each word of its answer. Using this prediction, Budget Guidance softly guides the model to keep its total reasoning within the target budget. This is done without fine-tuning the model, by adjusting the probabilities of next words during generation to meet the thinking length limit. This approach improves efficiency and accuracy on math problems and can adapt to other tasks as well.

Why it matters?

This matters because making large language models work more efficiently without losing performance saves time and computing resources. With Budget Guidance, AI can give better answers faster, especially on difficult tasks like math, which helps in real-world applications where speed and accuracy matter. It also shows a new way to control AI thinking during use without expensive retraining.

Abstract

Budget guidance is a method that steers LLM reasoning within a targeted budget without fine-tuning and achieves improved efficiency and performance on math benchmarks.