Token-Budget-Aware LLM Reasoning

Tingxu Han, Chunrong Fang, Shiyu Zhao, Shiqing Ma, Zhenyu Chen, Zhenting Wang

2024-12-26

Summary

This paper talks about Token-Budget-Aware LLM Reasoning, a new method that helps large language models (LLMs) use fewer tokens while still reasoning effectively, making them more efficient and cost-effective.

What's the problem?

Large language models are great at reasoning through problems, but using methods like Chain-of-Thought (CoT) can lead to high token usage, which increases costs. The reasoning process often takes longer than necessary, and finding the right balance between accuracy and token efficiency is challenging.

What's the solution?

To address this issue, the authors propose a framework called Token-Budget-Aware LLM Reasoning. This method estimates an appropriate token budget for different problems based on their complexity. By setting a reasonable limit on the number of tokens used, the model can focus on the most important parts of the reasoning process. Experiments show that this approach significantly reduces token usage—by an average of 68.64%—while maintaining a high level of accuracy in the answers.

Why it matters?

This research is important because it offers a practical solution for making AI systems more efficient. By reducing the number of tokens needed for reasoning, it helps lower operational costs while still delivering accurate results. This balance is crucial for improving the usability of LLMs in various applications, such as chatbots or automated question-answering systems.

Abstract

Reasoning is critical for large language models (LLMs) to excel in a wide range of tasks. While methods like Chain-of-Thought (CoT) reasoning enhance LLM performance by decomposing problems into intermediate steps, they also incur significant overhead in token usage, leading to increased costs. We find that the reasoning process of current LLMs is unnecessarily lengthy and it can be compressed by including a reasonable token budget in the prompt, but the choice of token budget plays a crucial role in the actual compression effectiveness. We then propose a token-budget-aware LLM reasoning framework, which dynamically estimates token budgets for different problems based on reasoning complexity and uses the estimated token budgets to guide the reasoning process. Experiments show that our method effectively reduces token costs in CoT reasoning with only a slight performance reduction, offering a practical solution to balance efficiency and accuracy in LLM reasoning. Code: https://github.com/GeniusHTX/TALE.

View Paper