Latent Collaboration in Multi-Agent Systems

Jiaru Zou, Xiyuan Yang, Ruizhong Qiu, Gaotang Li, Katherine Tieu, Pan Lu, Ke Shen, Hanghang Tong, Yejin Choi, Jingrui He, James Zou, Mengdi Wang, Ling Yang

2025-11-27

Latent Collaboration in Multi-Agent Systems

Summary

This paper introduces a new way for large language models (LLMs) to work together as a team, called LatentMAS. Instead of communicating through text like current systems, LatentMAS allows the models to share information directly through their internal 'thoughts' – the hidden patterns they use to understand and generate text.

What's the problem?

Existing systems that use multiple LLMs to solve problems often rely on the models exchanging text messages to coordinate. This text-based communication can be slow, lose important details during translation to and from text, and isn't very efficient. It's like trying to build something complex with a group while only being able to pass notes back and forth.

What's the solution?

LatentMAS avoids text communication altogether. Each LLM agent generates internal 'latent thoughts' – essentially, the numerical representations of its reasoning – and shares these directly with the other agents through a 'latent working memory'. This memory acts like a shared workspace where agents can access each other's internal understanding without any loss of information. The researchers also proved mathematically that this method is more expressive and efficient than text-based systems, and then demonstrated it with experiments.

Why it matters?

This research is important because it significantly improves how well LLMs can collaborate. By allowing direct sharing of internal representations, LatentMAS leads to more accurate results, uses fewer words in its final answers, and works much faster than previous methods. This could unlock more powerful AI systems capable of tackling complex problems that are beyond the reach of a single model.

Abstract

Multi-agent systems (MAS) extend large language models (LLMs) from independent single-model reasoning to coordinative system-level intelligence. While existing LLM agents depend on text-based mediation for reasoning and communication, we take a step forward by enabling models to collaborate directly within the continuous latent space. We introduce LatentMAS, an end-to-end training-free framework that enables pure latent collaboration among LLM agents. In LatentMAS, each agent first performs auto-regressive latent thoughts generation through last-layer hidden embeddings. A shared latent working memory then preserves and transfers each agent's internal representations, ensuring lossless information exchange. We provide theoretical analyses establishing that LatentMAS attains higher expressiveness and lossless information preservation with substantially lower complexity than vanilla text-based MAS. In addition, empirical evaluations across 9 comprehensive benchmarks spanning math and science reasoning, commonsense understanding, and code generation show that LatentMAS consistently outperforms strong single-model and text-based MAS baselines, achieving up to 14.6% higher accuracy, reducing output token usage by 70.8%-83.7%, and providing 4x-4.3x faster end-to-end inference. These results demonstrate that our new latent collaboration framework enhances system-level reasoning quality while offering substantial efficiency gains without any additional training. Code and data are fully open-sourced at https://github.com/Gen-Verse/LatentMAS.

View Paper