LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls
Kangning Zhang, Wenxiang Jiao, Kounianhua Du, Yuan Lu, Weiwen Liu, Weinan Zhang, Lei Zhang, Yong Yu
2025-11-13
Summary
This paper introduces a new way to train large language models (LLMs) to be better at using external tools, like calculators or search engines, to complete tasks.
What's the problem?
Currently, training LLMs to use tools involves creating training data and then training the model separately. This is inefficient because the training process doesn't actively identify where the model is struggling and improve those specific areas. Also, the training data often contains errors, which further hinders learning.
What's the solution?
The researchers developed a system called LoopTool that combines data creation and model training into a continuous cycle. It works in three steps: first, it figures out what the model can and can't do. Second, it uses another AI model to check and fix errors in the training data. Finally, it creates new, more difficult training examples focused on the areas where the model is failing. This whole process uses freely available tools, avoiding expensive services.
Why it matters?
This research shows that by constantly refining both the training data and the model itself, LLMs can become significantly better at using tools. This is important because tool use allows LLMs to tackle more complex problems and perform tasks beyond just generating text, and it does so in a cost-effective way.
Abstract
Augmenting Large Language Models (LLMs) with external tools enables them to execute complex, multi-step tasks. However, tool learning is hampered by the static synthetic data pipelines where data generation and model training are executed as two separate, non-interactive processes. This approach fails to adaptively focus on a model's specific weaknesses and allows noisy labels to persist, degrading training efficiency. We introduce LoopTool, a fully automated, model-aware data evolution framework that closes this loop by tightly integrating data synthesis and model training. LoopTool iteratively refines both the data and the model through three synergistic modules: (1) Greedy Capability Probing (GCP) diagnoses the model's mastered and failed capabilities; (2) Judgement-Guided Label Verification (JGLV) uses an open-source judge model to find and correct annotation errors, progressively purifying the dataset; and (3) Error-Driven Data Expansion (EDDE) generates new, challenging samples based on identified failures. This closed-loop process operates within a cost-effective, open-source ecosystem, eliminating dependence on expensive closed-source APIs. Experiments show that our 8B model trained with LoopTool significantly surpasses its 32B data generator and achieves new state-of-the-art results on the BFCL-v3 and ACEBench benchmarks for its scale. Our work demonstrates that closed-loop, self-refining data pipelines can dramatically enhance the tool-use capabilities of LLMs.