Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Weizhen Li, Jianbo Lin, Zhuosong Jiang, Jingyi Cao, Xinpeng Liu, Jiayu Zhang, Zhenqiang Huang, Qianben Chen, Weichen Sun, Qiexiang Wang, Hongxuan Lu, Tianrui Qin, Chenghao Zhu, Yi Yao, Shuying Fan, Xiaowan Li, Tiannan Wang, Pai Liu, King Zhu, He Zhu, Dingfeng Shi, Piaohong Wang

2025-08-20

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Summary

This paper introduces a new way for AI models, called Chain-of-Agents (CoA), to solve complex problems by acting like a team of different AI agents working together within a single model, which is more efficient and can learn from data better than current methods.

What's the problem?

Existing AI systems that use multiple agents to solve problems often require a lot of manual setup and complicated frameworks, which makes them slow, less effective, and unable to learn from data in a smart way.

What's the solution?

The researchers created Chain-of-Agents (CoA), a new approach where one AI model can simulate the collaboration of multiple specialized agents, dynamically switching between different tools and roles to solve problems step-by-step. They trained these models, called Agent Foundation Models (AFMs), using a special method that teaches them how to work like these multi-agent systems, and then further improved them with learning techniques that reward good problem-solving.

Why it matters?

This new method, Chain-of-Agents, represents a significant step forward in creating AI that can handle complex tasks much like humans do, achieving top performance on various challenges and providing an open resource for others to build upon for future AI agent research.

Abstract

Recent advances in large language models (LLMs) and multi-agent systems have demonstrated remarkable capabilities in complex problem-solving tasks such as deep research, vibe coding, and mathematical reasoning. However, most existing multi-agent systems are built upon manual prompt/workflow engineering with sophisticated agent frameworks, making them computationally inefficient, less capable, and can not benefit from data-centric learning. In this work, we introduce Chain-of-Agents (CoA), a novel paradigm of LLM reasoning that enables native end-to-end complex problem-solving in the same way as a multi-agent system (i.e., multi-turn problem solving with multiple tools and multiple agents) within one model. In chain-of-agents problem-solving, the model dynamically activates different tool agents and role-playing agents to simulate multi-agent collaboration in an end-to-end fashion. To elicit end-to-end chain-of-agents problem-solving abilities in LLMs, we introduce a multi-agent distillation framework to distill state-of-the-art multi-agent systems into chain-of-agents trajectories for agentic supervised fine-tuning. We then use agentic reinforcement learning on verifiable agentic tasks to further improve the models' capabilities on chain-of-agents problem solving. We call the resulting models Agent Foundation Models (AFMs). Our empirical studies demonstrate that AFM establishes new state-of-the-art performance across diverse benchmarks in both web agent and code agent settings. We make the entire research, including the model weights, code for training and evaluation, and the training data, fully open-sourced, which offers a solid starting point for future research on agent models and agentic RL.

View Paper