The Sum Leaks More Than Its Parts: Compositional Privacy Risks and Mitigations in Multi-Agent Collaboration
Vaidehi Patil, Elias Stengel-Eskin, Mohit Bansal
2025-09-18
Summary
This paper investigates a new kind of privacy problem that arises when multiple large language models (LLMs) work together. It shows how seemingly harmless answers, when put together over a conversation, can reveal private information about the data the LLMs were trained on.
What's the problem?
Current privacy concerns with LLMs focus on things like the model directly remembering training data or easily revealing information with a single question. This paper highlights a more subtle issue: information 'leaks' gradually as an LLM participates in a series of interactions. Even if each individual response seems safe, the combined effect can allow someone to piece together sensitive details. This is especially concerning in systems where multiple LLMs are chatting with each other or with users.
What's the solution?
The researchers developed two methods to combat this 'compositional privacy leakage'. The first, called Theory-of-Mind defense, involves LLMs trying to anticipate how their answers might be misused to extract private information. The second, Collaborative Consensus Defense, has LLMs work together and 'vote' on responses to prevent sensitive details from spreading. They tested these defenses by trying to trick the LLMs into revealing information and measuring how well the defenses blocked those attempts while still allowing the LLMs to perform their tasks effectively.
Why it matters?
This research is important because as LLMs become more common in collaborative systems – like virtual assistants working together – this new type of privacy risk becomes more significant. The findings provide practical strategies for building safer multi-agent LLM systems and balancing the need for privacy with the usefulness of these powerful tools. It shows that simply preventing leakage in individual responses isn't enough; we need to consider how information accumulates over time and across interactions.
Abstract
As large language models (LLMs) become integral to multi-agent systems, new privacy risks emerge that extend beyond memorization, direct inference, or single-turn evaluations. In particular, seemingly innocuous responses, when composed across interactions, can cumulatively enable adversaries to recover sensitive information, a phenomenon we term compositional privacy leakage. We present the first systematic study of such compositional privacy leaks and possible mitigation methods in multi-agent LLM systems. First, we develop a framework that models how auxiliary knowledge and agent interactions jointly amplify privacy risks, even when each response is benign in isolation. Next, to mitigate this, we propose and evaluate two defense strategies: (1) Theory-of-Mind defense (ToM), where defender agents infer a questioner's intent by anticipating how their outputs may be exploited by adversaries, and (2) Collaborative Consensus Defense (CoDef), where responder agents collaborate with peers who vote based on a shared aggregated state to restrict sensitive information spread. Crucially, we balance our evaluation across compositions that expose sensitive information and compositions that yield benign inferences. Our experiments quantify how these defense strategies differ in balancing the privacy-utility trade-off. We find that while chain-of-thought alone offers limited protection to leakage (~39% sensitive blocking rate), our ToM defense substantially improves sensitive query blocking (up to 97%) but can reduce benign task success. CoDef achieves the best balance, yielding the highest Balanced Outcome (79.8%), highlighting the benefit of combining explicit reasoning with defender collaboration. Together, our results expose a new class of risks in collaborative LLM deployments and provide actionable insights for designing safeguards against compositional, context-driven privacy leakage.