Multi-Agent Evolve: LLM Self-Improve through Co-evolution

Yixing Chen, Yiding Wang, Siqi Zhu, Haofei Yu, Tao Feng, Muhan Zhan, Mostofa Patwary, Jiaxuan You

2025-10-28

Multi-Agent Evolve: LLM Self-Improve through Co-evolution

Summary

This paper introduces a new method called Multi-Agent Evolve (MAE) to improve how well large language models (LLMs) can reason and solve problems, like math questions or general knowledge quizzes.

What's the problem?

Currently, making LLMs better at reasoning often requires a lot of data labeled by humans, which is expensive and doesn't work well for all kinds of problems. Some newer methods use 'self-play,' where the LLM practices on its own, but these usually need a specific environment like a game or a programming tool to give feedback. It's hard to get self-play to work well in more open-ended situations without a clear environment.

What's the solution?

The researchers created MAE, which uses three different 'roles' all played by the same LLM: a Proposer that creates questions, a Solver that tries to answer them, and a Judge that evaluates the answers. These roles constantly interact and improve through reinforcement learning, meaning they get 'rewarded' for good performance. This allows the LLM to learn and get better at reasoning without needing humans to provide labeled data or a specific environment.

Why it matters?

MAE is important because it offers a way to make LLMs smarter and more capable without relying on large, human-created datasets. This makes the process more scalable and adaptable to a wider range of tasks, potentially leading to more powerful and generally intelligent AI systems.

Abstract

Reinforcement Learning (RL) has demonstrated significant potential in enhancing the reasoning capabilities of large language models (LLMs). However, the success of RL for LLMs heavily relies on human-curated datasets and verifiable rewards, which limit their scalability and generality. Recent Self-Play RL methods, inspired by the success of the paradigm in games and Go, aim to enhance LLM reasoning capabilities without human-annotated data. However, their methods primarily depend on a grounded environment for feedback (e.g., a Python interpreter or a game engine); extending them to general domains remains challenging. To address these challenges, we propose Multi-Agent Evolve (MAE), a framework that enables LLMs to self-evolve in solving diverse tasks, including mathematics, reasoning, and general knowledge Q&A. The core design of MAE is based on a triplet of interacting agents (Proposer, Solver, Judge) that are instantiated from a single LLM, and applies reinforcement learning to optimize their behaviors. The Proposer generates questions, the Solver attempts solutions, and the Judge evaluates both while co-evolving. Experiments on Qwen2.5-3B-Instruct demonstrate that MAE achieves an average improvement of 4.54% on multiple benchmarks. These results highlight MAE as a scalable, data-efficient method for enhancing the general reasoning abilities of LLMs with minimal reliance on human-curated supervision.

View Paper