Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains

Vighnesh Subramaniam, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba, Shuang Li, Igor Mordatch

2025-01-13

Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains

Summary

This paper introduces a new way to improve AI language models called 'multiagent finetuning', which uses a group of AI models to help each other get better at solving problems.

What's the problem?

Current AI language models are limited by their training data, and attempts to improve them by generating new data for self-improvement eventually stop showing significant progress. This is because the models tend to get stuck in a loop, repeating the same kinds of answers without much variety.

What's the solution?

The researchers created a system where multiple AI models, all starting from the same base, work together to improve. Each model is trained on different data generated through interactions with the other models. This approach allows each model to develop its own specialties and keeps a diverse range of problem-solving methods (called 'reasoning chains') available. The models can continue improving over many rounds of training, unlike single-model methods that tend to plateau.

Why it matters?

This method matters because it allows AI language models to keep improving beyond their initial training, potentially leading to more capable and versatile AI systems. It helps solve the problem of models getting stuck or overfitting to limited data. By maintaining diversity in problem-solving approaches, these AI systems could become better at handling a wide range of tasks and might be more adaptable to new challenges. This could lead to significant advancements in AI's ability to reason and solve complex problems across various fields.

Abstract

Large language models (LLMs) have achieved remarkable performance in recent years but are fundamentally limited by the underlying training data. To improve models beyond the training data, recent works have explored how LLMs can be used to generate synthetic data for autonomous self-improvement. However, successive steps of self-improvement can reach a point of diminishing returns. In this work, we propose a complementary approach towards self-improvement where finetuning is applied to a multiagent society of language models. A group of language models, all starting from the same base model, are independently specialized by updating each one using data generated through multiagent interactions among the models. By training each model on independent sets of data, we illustrate how this approach enables specialization across models and diversification over the set of models. As a result, our overall system is able to preserve diverse reasoning chains and autonomously improve over many more rounds of fine-tuning than single-agent self-improvement methods. We quantitatively illustrate the efficacy of the approach across a wide suite of reasoning tasks.

View Paper