MALT: Improving Reasoning with Multi-Agent LLM Training
Sumeet Ramesh Motwani, Chandler Smith, Rocktim Jyoti Das, Markian Rybchuk, Philip H. S. Torr, Ivan Laptev, Fabio Pizzati, Ronald Clark, Christian Schroeder de Witt
2024-12-04

Summary
This paper introduces MALT, a new training method for large language models (LLMs) that allows multiple models to work together to solve reasoning problems more effectively.
What's the problem?
While LLMs have shown great abilities in reasoning tasks, they are usually trained as single models. This means they work alone, and humans have to critique their outputs. However, this approach limits their potential to collaborate and solve complex problems together, which is essential for developing advanced autonomous systems.
What's the solution?
MALT (Multi-Agent LLM Training) changes this by using a team of specialized LLMs that each have different roles: one model generates ideas, another checks those ideas, and a third refines them. This setup allows the models to work together in a structured way. The researchers also developed a method to create synthetic data that helps the models learn from both successful and unsuccessful attempts at solving problems. By training these models in this collaborative manner, MALT improves their ability to reason correctly across various tasks.
Why it matters?
This research is significant because it paves the way for more advanced AI systems that can work together like a team. By improving how LLMs collaborate, MALT can lead to better performance on complex reasoning tasks, which can be applied in many areas such as education, research, and real-world problem-solving. This could ultimately enhance the capabilities of AI in various fields and make them more useful in everyday applications.
Abstract
Enabling effective collaboration among LLMs is a crucial step toward developing autonomous systems capable of solving complex problems. While LLMs are typically used as single-model generators, where humans critique and refine their outputs, the potential for jointly-trained collaborative models remains largely unexplored. Despite promising results in multi-agent communication and debate settings, little progress has been made in training models to work together on tasks. In this paper, we present a first step toward "Multi-agent LLM training" (MALT) on reasoning problems. Our approach employs a sequential multi-agent setup with heterogeneous LLMs assigned specialized roles: a generator, verifier, and refinement model iteratively solving problems. We propose a trajectory-expansion-based synthetic data generation process and a credit assignment strategy driven by joint outcome based rewards. This enables our post-training setup to utilize both positive and negative trajectories to autonomously improve each model's specialized capabilities as part of a joint sequential system. We evaluate our approach across MATH, GSM8k, and CQA, where MALT on Llama 3.1 8B models achieves relative improvements of 14.14%, 7.12%, and 9.40% respectively over the same baseline model. This demonstrates an early advance in multi-agent cooperative capabilities for performance on mathematical and common sense reasoning questions. More generally, our work provides a concrete direction for research around multi-agent LLM training approaches.