Finding Dori: Memorization in Text-to-Image Diffusion Models Is Less Local Than Assumed
Antoni Kowalczuk, Dominik Hintersdorf, Lukas Struppek, Kristian Kersting, Adam Dziedzic, Franziska Boenisch
2025-07-24
Summary
This paper talks about how text-to-image diffusion models sometimes memorize parts of their training images and reproduce them, which can be a problem for privacy and originality.
What's the problem?
Current attempts to stop these models from copying training images by removing parts of the model are not enough because even small changes to the text inputs can cause the model to remember and recreate these images again.
What's the solution?
The researchers discovered that memorization is spread across the model in a more complex way than thought, so simple pruning isn’t sufficient. They showed that better methods are needed to truly eliminate memorized content from the models.
Why it matters?
This matters because addressing unwanted memorization is important for protecting copyright, privacy, and ensuring that AI-generated images are original and safe to use.
Abstract
Pruning-based defenses in text-to-image diffusion models are insufficient as minor adjustments to text embeddings can re-trigger data replication, necessitating methods that truly remove memorized content.