ReLearn: Unlearning via Learning for Large Language Models

Haoming Xu, Ningyuan Zhao, Liming Yang, Sendong Zhao, Shumin Deng, Mengru Wang, Bryan Hooi, Nay Oo, Huajun Chen, Ningyu Zhang

2025-02-18

ReLearn: Unlearning via Learning for Large Language Models

Summary

This paper talks about ReLearn, a new method to help large language models (LLMs) forget specific information without messing up their ability to generate good text. It's like teaching a smart computer to selectively erase certain memories while keeping its overall knowledge intact.

What's the problem?

Current ways of making AI forget things often mess up how well the AI can write and understand language. They focus too much on making the AI forget specific words or phrases, which can make the AI's responses sound weird or not make sense. Also, the ways we measure how well the AI has forgotten aren't very good at checking if the AI can still write fluently and relevantly.

What's the solution?

The researchers created ReLearn, which uses a clever trick of adding new data and fine-tuning the AI to make it forget specific things. They also made new ways to measure how well the AI is forgetting and remembering, called Knowledge Forgetting Rate (KFR) and Knowledge Retention Rate (KRR). They added another measure called Linguistic Score (LS) to check if the AI's writing is still good quality. Their tests showed that ReLearn could make the AI forget specific things while still writing well.

Why it matters?

This matters because as AI gets smarter and knows more, we sometimes need to make it forget things like private information or incorrect facts. ReLearn offers a way to do this without breaking the AI's ability to communicate well. This could help make AI safer and more reliable to use in the real world, where we might need to update or correct what the AI knows without starting from scratch.

Abstract

Current unlearning methods for large language models usually rely on reverse optimization to reduce target token probabilities. However, this paradigm disrupts the subsequent tokens prediction, degrading model performance and linguistic coherence. Moreover, existing evaluation metrics overemphasize contextual forgetting while inadequately assessing response fluency and relevance. To address these challenges, we propose ReLearn, a data augmentation and fine-tuning pipeline for effective unlearning, along with a comprehensive evaluation framework. This framework introduces Knowledge Forgetting Rate (KFR) and Knowledge Retention Rate (KRR) to measure knowledge-level preservation, and Linguistic Score (LS) to evaluate generation quality. Our experiments show that ReLearn successfully achieves targeted forgetting while preserving high-quality output. Through mechanistic analysis, we further demonstrate how reverse optimization disrupts coherent text generation, while ReLearn preserves this essential capability. Code is available at https://github.com/zjunlp/unlearn.

View Paper