CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards
Xiangyuan Xue, Yifan Zhou, Guibin Zhang, Zaibin Zhang, Yijiang Li, Chen Zhang, Zhenfei Yin, Philip Torr, Wanli Ouyang, Lei Bai
2025-10-10
Summary
This paper explores how to make AI agents, powered by large language models, continuously get better at tasks on their own, a process called self-evolution.
What's the problem?
Currently, methods for improving these AI agents after their initial training either need a lot of guidance from humans in the form of rewards, or they try to create rewards for themselves based on their own internal understanding. However, this doesn't really mimic how humans learn – we usually improve by talking to and learning from each other.
What's the solution?
The researchers developed a system called CoMAS, which stands for Co-Evolving Multi-Agent Systems. This system lets multiple AI agents interact with each other and learn from those interactions. They essentially 'discuss' problems, and an LLM acts as a judge to give rewards based on the quality of the discussion. This allows each agent to improve its skills through a process similar to collaboration, without needing direct human input.
Why it matters?
This research is important because it provides a new and effective way for AI agents to improve themselves. By learning from each other, these agents can become more capable and adaptable, and the system is designed to work well even with a large number of agents, paving the way for more advanced and autonomous AI systems.
Abstract
Self-evolution is a central research topic in enabling large language model (LLM)-based agents to continually improve their capabilities after pretraining. Recent research has witnessed a transition from reinforcement learning (RL)-free to RL-based methods. Current RL-based methods either rely on dense external reward signals or extract intrinsic reward signals from LLMs themselves. However, these approaches diverge from the self-evolution mechanisms observed in human intelligence, where individuals learn and improve through mutual discussion and collaboration. In this work, we introduce Co-Evolving Multi-Agent Systems (CoMAS), a novel framework that enables agents to improve autonomously by learning from inter-agent interactions without external supervision. CoMAS generates intrinsic rewards from rich discussion dynamics, employs an LLM-as-a-judge mechanism to formulate these rewards, and optimizes each agent's policy through RL, thereby enabling decentralized and scalable co-evolution. Experimental results demonstrate that CoMAS consistently outperforms untrained agents and achieves state-of-the-art performance across most evaluation settings. Ablation studies confirm the necessity of interaction-based reward signals and reveal promising scalability as the number and diversity of agents increase. These findings establish CoMAS as a novel and effective paradigm for self-evolution in LLM-based agents.