Measuring memorization through probabilistic discoverable extraction

Jamie Hayes, Marika Swanberg, Harsh Chaudhari, Itay Yona, Ilia Shumailov

2024-10-30

Measuring memorization through probabilistic discoverable extraction

Summary

This paper discusses a new method for measuring how well large language models (LLMs) memorize information from their training data. It introduces a probabilistic approach to better understand the risks of these models potentially recalling sensitive information.

What's the problem?

LLMs can memorize parts of their training data, which raises concerns about privacy and the unintentional extraction of sensitive information. Current methods for measuring how much these models memorize often underestimate the true extent of memorization because they rely on a single way of sampling data, which doesn't capture the full picture.

What's the solution?

The authors propose a new method called probabilistic discoverable extraction, which assesses the likelihood of retrieving specific pieces of information from the model based on various sampling techniques and multiple attempts. This approach takes into account how LLMs work and how users interact with them, providing a more accurate measure of memorization rates. Through experiments, they show that this new method can reveal higher memorization rates than traditional methods and offers insights into how different sampling strategies affect the ability to extract information.

Why it matters?

This research is significant because it helps to better understand the memorization capabilities of LLMs, which is crucial for addressing privacy concerns. By improving measurement techniques, developers can create safer AI systems that are less likely to unintentionally recall sensitive information, ultimately making AI technology more reliable and trustworthy.

Abstract

Large language models (LLMs) are susceptible to memorizing training data, raising concerns due to the potential extraction of sensitive information. Current methods to measure memorization rates of LLMs, primarily discoverable extraction (Carlini et al., 2022), rely on single-sequence greedy sampling, potentially underestimating the true extent of memorization. This paper introduces a probabilistic relaxation of discoverable extraction that quantifies the probability of extracting a target sequence within a set of generated samples, considering various sampling schemes and multiple attempts. This approach addresses the limitations of reporting memorization rates through discoverable extraction by accounting for the probabilistic nature of LLMs and user interaction patterns. Our experiments demonstrate that this probabilistic measure can reveal cases of higher memorization rates compared to rates found through discoverable extraction. We further investigate the impact of different sampling schemes on extractability, providing a more comprehensive and realistic assessment of LLM memorization and its associated risks. Our contributions include a new probabilistic memorization definition, empirical evidence of its effectiveness, and a thorough evaluation across different models, sizes, sampling schemes, and training data repetitions.

View Paper