Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training

Yuchen Zhuang, Jingfeng Yang, Haoming Jiang, Xin Liu, Kewei Cheng, Sanket Lokegaonkar, Yifan Gao, Qing Ping, Tianyi Liu, Binxuan Huang, Zheng Li, Zhengyang Wang, Pei Chen, Ruijie Wang, Rongzhi Zhang, Nasser Zalmout, Priyanka Nigam, Bing Yin, Chao Zhang

2025-02-12

Hephaestus: Improving Fundamental Agent Capabilities of Large Language
Models through Continual Pre-Training

Summary

This paper talks about Hephaestus, a new way to make AI language models better at performing tasks and using tools, especially when interacting with computer programs through their interfaces (APIs).

What's the problem?

Current AI agents that use large language models (LLMs) often struggle to learn new skills without losing their ability to handle a wide range of tasks. This is because there isn't much training data specifically designed for teaching AI how to act as agents that can use tools and adapt to different situations.

What's the solution?

The researchers created Hephaestus-Forge, a huge collection of training data designed to teach AI models how to use APIs, reason through problems, and adapt to feedback. This dataset includes information about over 76,000 different APIs and examples of how to use them. They then used this data to train an AI model called Hephaestus, continuously improving its skills without making it forget what it already knew.

Why it matters?

This matters because it could lead to AI assistants that are much more capable of performing real-world tasks, like interacting with software or solving complex problems. Hephaestus performed better than many other AI models, including some commercial ones, which shows that this approach could make AI tools more useful and adaptable for a wide range of applications.

Abstract

Due to the scarcity of agent-oriented pre-training data, LLM-based autonomous agents typically rely on complex prompting or extensive fine-tuning, which often fails to introduce new capabilities while preserving strong generalizability. We introduce Hephaestus-Forge, the first large-scale pre-training corpus designed to enhance the fundamental capabilities of LLM agents in API function calling, intrinsic reasoning and planning, and adapting to environmental feedback. Hephaestus-Forge comprises 103B agent-specific data encompassing 76,537 APIs, including both tool documentation to introduce knowledge of API functions and function calling trajectories to strengthen intrinsic reasoning. To explore effective training protocols, we investigate scaling laws to identify the optimal recipe in data mixing ratios. By continual pre-training on Hephaestus-Forge, Hephaestus outperforms small- to medium-scale open-source LLMs and rivals commercial LLMs on three agent benchmarks, demonstrating the effectiveness of our pre-training corpus in enhancing fundamental agentic capabilities and generalization of LLMs to new tasks or environments.

View Paper