NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models

Han Han, Tong Zhu, Xiang Zhang, Mengsong Wu, Hao Xiong, Wenliang Chen

2024-10-16

NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models

Summary

This paper introduces NesTools, a new dataset designed to evaluate how well large language models (LLMs) can learn to use tools in nested ways, where one tool's output is used as input for another.

What's the problem?

Current research on how LLMs handle nested tool learning is limited because there aren't enough relevant examples in existing datasets. Most benchmarks focus on simple tool use rather than the more complex scenarios where tools are used in a sequence, which makes it hard to assess the models' true capabilities.

What's the solution?

To fill this gap, the authors created NesTools, which includes a large number of examples of nested tool calls with various structures. They developed an automatic method to generate this data and ensured its quality through manual review. This dataset allows researchers to better evaluate how LLMs perform when they need to use multiple tools together, reflecting real-world situations more accurately.

Why it matters?

This research is important because it provides a new benchmark for testing LLMs' abilities in nested tool learning. By improving our understanding of how these models can effectively use tools in complex scenarios, NesTools can help enhance the development of AI systems that require sophisticated tool interactions, making them more useful in practical applications.

Abstract

Large language models (LLMs) combined with tool learning have gained impressive results in real-world applications. During tool learning, LLMs may call multiple tools in nested orders, where the latter tool call may take the former response as its input parameters. However, current research on the nested tool learning capabilities is still under-explored, since the existing benchmarks lack of relevant data instances. To address this problem, we introduce NesTools to bridge the current gap in comprehensive nested tool learning evaluations. NesTools comprises a novel automatic data generation method to construct large-scale nested tool calls with different nesting structures. With manual review and refinement, the dataset is in high quality and closely aligned with real-world scenarios. Therefore, NesTools can serve as a new benchmark to evaluate the nested tool learning abilities of LLMs. We conduct extensive experiments on 22 LLMs, and provide in-depth analyses with NesTools, which shows that current LLMs still suffer from the complex nested tool learning task.

View Paper