ET-Agent: Incentivizing Effective Tool-Integrated Reasoning Agent via Behavior Calibration
Yifei Chen, Guanting Dong, Zhicheng Dou
2026-01-13
Summary
This paper introduces a new way to train computer programs, called ET-Agent, that use external tools to solve problems. These programs are based on Large Language Models (LLMs), which are powerful AI systems, but often struggle to use tools effectively.
What's the problem?
Large Language Models are getting better at using tools to extend their knowledge, but current training methods mostly focus on getting the *right* answer. They don't pay enough attention to *how* the program uses those tools. This leads to programs that make unnecessary tool calls, don't use tools when they should, or generally take a roundabout way to find a solution. Essentially, the programs aren't 'smart' about *how* they think through a problem using tools.
What's the solution?
The researchers developed ET-Agent, a training system that improves how these programs use tools. It works in two main steps. First, it creates better training data by letting the program explore and learn from its own mistakes. Then, it uses this data to train the program in two phases, gradually correcting bad habits and encouraging efficient tool use. This helps the program learn to choose the right tools at the right time and avoid wasting steps.
Why it matters?
This research is important because it addresses a key weakness in current AI systems that rely on tools. By improving how these programs use tools, we can make them more reliable, efficient, and capable of solving complex problems. The ET-Agent framework provides a practical approach for building better AI agents that can effectively interact with the real world.
Abstract
Large Language Models (LLMs) can extend their parameter knowledge limits by adopting the Tool-Integrated Reasoning (TIR) paradigm. However, existing LLM-based agent training framework often focuses on answers' accuracy, overlooking specific alignment for behavior patterns. Consequently, agent often exhibits ineffective actions during TIR tasks, such as redundant and insufficient tool calls. How to calibrate erroneous behavioral patterns when executing TIR tasks, thereby exploring effective trajectories, remains an open-ended problem. In this paper, we propose ET-Agent, a training framework for calibrating agent's tool-use behavior through two synergistic perspectives: Self-evolving Data Flywheel and Behavior Calibration Training. Specifically, we introduce a self-evolutionary data flywheel to generate enhanced data, used to fine-tune LLM to improve its exploration ability. Based on this, we implement an two-phases behavior-calibration training framework. It is designed to progressively calibrate erroneous behavioral patterns to optimal behaviors. Further in-depth experiments confirm the superiority of across multiple dimensions, including correctness, efficiency, reasoning conciseness, and tool execution accuracy. Our ET-Agent framework provides practical insights for research in the TIR field. Codes can be found in https://github.com/asilverlight/ET-Agent