LLM as a Broken Telephone: Iterative Generation Distorts Information
Amr Mohamed, Mingmeng Geng, Michalis Vazirgiannis, Guokan Shang
2025-03-07
Summary
This paper talks about how large language models (LLMs) can distort information when they repeatedly process their own outputs, similar to the 'broken telephone' game where messages get altered as they are passed along
What's the problem?
When LLMs are used for tasks like translation or rephrasing over multiple steps, small errors in meaning or facts can build up, making the final output less accurate. This creates concerns about the reliability of AI-generated content, especially when used iteratively
What's the solution?
The researchers studied this 'broken telephone' effect by testing LLMs with translation chains and rephrasing tasks. They found that distortion depends on factors like the choice of intermediate languages and the complexity of the chain. They also showed that controlling certain settings, like temperature and prompts, can reduce distortion and improve output quality
Why it matters?
This matters because as AI becomes more involved in creating and sharing information, understanding its limitations is crucial. By identifying how iterative use distorts content and finding ways to minimize these issues, this research helps ensure that AI-generated information remains reliable and accurate over time
Abstract
As large language models are increasingly responsible for online content, concerns arise about the impact of repeatedly processing their own outputs. Inspired by the "broken telephone" effect in chained human communication, this study investigates whether LLMs similarly distort information through iterative generation. Through translation-based experiments, we find that distortion accumulates over time, influenced by language choice and chain complexity. While degradation is inevitable, it can be mitigated through strategic prompting techniques. These findings contribute to discussions on the long-term effects of AI-mediated information propagation, raising important questions about the reliability of LLM-generated content in iterative workflows.