APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

Zuxin Liu, Thai Hoang, Jianguo Zhang, Ming Zhu, Tian Lan, Shirley Kokane, Juntao Tan, Weiran Yao, Zhiwei Liu, Yihao Feng, Rithesh Murthy, Liangwei Yang, Silvio Savarese, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong

2024-06-26

APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

Summary

This paper introduces APIGen, a new automated system for creating high-quality datasets that help function-calling models perform better. These models allow AI to interact with various applications by executing specific tasks through APIs.

What's the problem?

Function-calling models need diverse and reliable datasets to learn effectively, but many existing datasets are either low-quality or not well-verified. This can lead to poor performance when the models try to handle new or different tasks, making it hard for them to work in real-world applications.

What's the solution?

The authors developed APIGen, which collects and verifies data from 3,673 executable APIs across 21 categories. They ensure the quality of the datasets through a three-step verification process: checking the format of the data, executing the functions to confirm they work correctly, and verifying that the results make sense. This rigorous process results in a dataset of 60,000 high-quality entries that significantly improves model performance. Models trained on these datasets achieved impressive results on benchmarks, even outperforming larger models like GPT-4.

Why it matters?

This research is important because it provides a way to generate high-quality training data for AI systems that need to call functions from APIs. By improving the quality and reliability of these datasets, APIGen enables better performance for AI applications in various fields, such as finance, healthcare, and customer service. This advancement can lead to more effective and efficient AI systems that can handle complex tasks.

Abstract

The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets in a scalable and structured manner. Each data in our dataset is verified through three hierarchical stages: format checking, actual function executions, and semantic verification, ensuring its reliability and correctness. We demonstrate that models trained with our curated datasets, even with only 7B parameters, can achieve state-of-the-art performance on the Berkeley Function-Calling Benchmark, outperforming multiple GPT-4 models. Moreover, our 1B model achieves exceptional performance, surpassing GPT-3.5-Turbo and Claude-3 Haiku. We release a dataset containing 60,000 high-quality entries, aiming to advance the field of function-calling agent domains. The dataset is available on Huggingface: https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k and the project homepage: https://apigen-pipeline.github.io/

View Paper