UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning
Vaidehi Patil, Elias Stengel-Eskin, Mohit Bansal
2025-02-24
Summary
This paper talks about UPCORE, a new method to help AI language models forget specific information without losing their overall abilities.
What's the problem?
When AI models need to forget certain data due to privacy or legal reasons, they often end up forgetting too much or not enough. This can make the AI less useful or not properly follow the rules for removing information.
What's the solution?
The researchers created UPCORE, which carefully chooses which parts of the data to forget. It looks for unusual or 'outlier' data points that might cause the most problems when forgotten. By removing these outliers from the set of data to be forgotten, UPCORE helps the AI forget the right information while keeping its other skills intact.
Why it matters?
This matters because as AI becomes more common, we need ways to make it forget sensitive information without breaking it. UPCORE could help companies follow privacy laws and respect user rights while still having useful AI models. It's a step towards making AI systems that can learn and unlearn information in a controlled way, which is important for building trustworthy and adaptable AI.
Abstract
User specifications or legal frameworks often require information to be removed from pretrained models, including large language models (LLMs). This requires deleting or "forgetting" a set of data points from an already-trained model, which typically degrades its performance on other data points. Thus, a balance must be struck between removing information and keeping the model's other abilities intact, with a failure to balance this trade-off leading to poor deletion or an unusable model. To this end, we propose UPCORE (Utility-Preserving Coreset Selection), a method-agnostic data selection framework for mitigating collateral damage during unlearning. Finding that the model damage is correlated with the variance of the model's representations on the forget set, we selectively prune the forget set to remove outliers, thereby minimizing model degradation after unlearning. We evaluate UPCORE across three standard unlearning methods consistently achieving a superior balance between the competing objectives of deletion efficacy and model preservation. To better evaluate this trade-off, we introduce a new metric, measuring the area-under-the-curve (AUC) across standard metrics. We find that UPCORE improves both standard metrics and AUC, benefitting from positive transfer between the coreset and pruned points while reducing negative transfer from the forget set to points outside of it.