Budget-Aware Tool-Use Enables Effective Agent Scaling

Tengxiao Liu, Zifeng Wang, Jin Miao, I-Hung Hsu, Jun Yan, Jiefeng Chen, Rujun Han, Fangyuan Xu, Yanfei Chen, Ke Jiang, Samira Daruki, Yi Liang, William Yang Wang, Tomas Pfister, Chen-Yu Lee

2025-11-25

Budget-Aware Tool-Use Enables Effective Agent Scaling

Summary

This paper investigates how to make AI agents that use tools, like web search, more effective when there are limits on how many tools they can use. It's about getting the most out of these agents without letting them waste resources.

What's the problem?

When you give AI agents more access to tools, you'd expect them to perform better, but simply allowing more tool use doesn't always work. These agents don't naturally understand they *have* a limit on tool usage and often don't improve much beyond a certain point. They essentially don't know how to manage their 'budget' of tool calls.

What's the solution?

The researchers developed two main things. First, they created a 'Budget Tracker,' a simple add-on that constantly reminds the agent how many tool calls it has left. Then, they built a more complex system called 'BATS' which uses this budget information to make smart decisions – deciding when to investigate a promising idea further and when to try a completely new approach, all based on how many tool calls remain. They also created a way to measure both the cost of using tokens (the language model part) and tool calls together.

Why it matters?

This work is important because it shows us how to build more efficient and reliable AI agents that use tools. By making agents 'budget aware,' we can get better performance from them with the same amount of resources, and understand the trade-offs between cost and effectiveness. It provides a more structured way to think about scaling up these agents and improving their abilities.

Abstract

Scaling test-time computation improves performance across different tasks on large language models (LLMs), which has also been extended to tool-augmented agents. For these agents, scaling involves not only "thinking" in tokens but also "acting" via tool calls. The number of tool calls directly bounds the agent's interaction with the external environment. However, we find that simply granting agents a larger tool-call budget fails to improve performance, as they lack "budget awareness" and quickly hit a performance ceiling. To address this, we study how to scale such agents effectively under explicit tool-call budgets, focusing on web search agents. We first introduce the Budget Tracker, a lightweight plug-in that provides the agent with continuous budget awareness, enabling simple yet effective scaling. We further develop BATS (Budget Aware Test-time Scaling), an advanced framework that leverages this awareness to dynamically adapt its planning and verification strategy, deciding whether to "dig deeper" on a promising lead or "pivot" to new paths based on remaining resources. To analyze cost-performance scaling in a controlled manner, we formalize a unified cost metric that jointly accounts for token and tool consumption. We provide the first systematic study on budget-constrained agents, showing that budget-aware methods produce more favorable scaling curves and push the cost-performance Pareto frontier. Our work offers empirical insights toward a more transparent and principled understanding of scaling in tool-augmented agents.

View Paper