Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs
Abhimanyu Hans, Yuxin Wen, Neel Jain, John Kirchenbauer, Hamid Kazemi, Prajwal Singhania, Siddharth Singh, Gowthami Somepalli, Jonas Geiping, Abhinav Bhatele, Tom Goldstein
2024-06-17
Summary
This paper discusses a new approach called 'goldfish loss' that helps large language models (LLMs) avoid memorizing their training data. This is important because memorization can lead to problems like privacy violations and copyright issues when the model repeats exact phrases from its training.
What's the problem?
Large language models can sometimes memorize specific pieces of text from the data they were trained on. This means that when they generate responses, they might directly repeat this text, which can be a problem for privacy (if sensitive information is repeated) and copyright (if it uses protected content). This memorization can make these models less safe and reliable for real-world applications.
What's the solution?
To tackle this issue, the authors introduced the goldfish loss technique. Instead of training the model on all tokens (words or phrases) in the data, they randomly exclude some tokens from the training process. By doing this, the model learns not to memorize those excluded tokens, which helps prevent it from reproducing exact sequences from its training data. The researchers tested this method on large models and found that it significantly reduced memorization without negatively affecting how well the models performed on other tasks.
Why it matters?
This research is important because it offers a way to make AI models safer and more ethical by reducing the risks associated with memorization. By using goldfish loss, developers can create language models that are less likely to repeat sensitive or copyrighted information, making them more suitable for use in various applications where privacy and compliance with laws are crucial.
Abstract
Large language models can memorize and repeat their training data, causing privacy and copyright risks. To mitigate memorization, we introduce a subtle modification to the next-token training objective that we call the goldfish loss. During training, a randomly sampled subset of tokens are excluded from the loss computation. These dropped tokens are not memorized by the model, which prevents verbatim reproduction of a complete chain of tokens from the training set. We run extensive experiments training billion-scale Llama-2 models, both pre-trained and trained from scratch, and demonstrate significant reductions in extractable memorization with little to no impact on downstream benchmarks.