To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models

Bozhong Tian, Xiaozhuan Liang, Siyuan Cheng, Qingbin Liu, Mengru Wang, Dianbo Sui, Xi Chen, Huajun Chen, Ningyu Zhang

2024-07-03

To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models

Summary

This paper talks about a new method called MemFlex that improves how large language models (LLMs) can forget specific sensitive information, like personal data or copyrighted material, without losing important knowledge.

What's the problem?

The main problem is that LLMs, which are trained on large amounts of data, often keep sensitive information that they shouldn't. Current methods for 'unlearning' this information can be too broad, meaning they might erase useful knowledge along with the bad stuff, which can hurt the model's overall performance.

What's the solution?

To solve this issue, the authors introduced KnowUnDo, a benchmark that includes examples of copyrighted content and personal privacy issues to test how well unlearning methods work. They found that many existing methods erase too much information. In response, they developed MemFlex, which uses gradient information to carefully target and erase only the sensitive parts of the model while keeping the rest of its knowledge intact. Their experiments showed that MemFlex outperforms other methods in both effectively unlearning sensitive data and retaining general knowledge.

Why it matters?

This research is important because it helps make AI systems safer and more responsible by ensuring they don't retain harmful or private information. By improving unlearning techniques, we can create more ethical AI models that respect user privacy and comply with copyright laws.

Abstract

Large Language Models (LLMs) trained on extensive corpora inevitably retain sensitive data, such as personal privacy information and copyrighted material. Recent advancements in knowledge unlearning involve updating LLM parameters to erase specific knowledge. However, current unlearning paradigms are mired in vague forgetting boundaries, often erasing knowledge indiscriminately. In this work, we introduce KnowUnDo, a benchmark containing copyrighted content and user privacy domains to evaluate if the unlearning process inadvertently erases essential knowledge. Our findings indicate that existing unlearning methods often suffer from excessive unlearning. To address this, we propose a simple yet effective method, MemFlex, which utilizes gradient information to precisely target and unlearn sensitive parameters. Experimental results show that MemFlex is superior to existing methods in both precise knowledge unlearning and general knowledge retaining of LLMs. Code and dataset will be released at https://github.com/zjunlp/KnowUnDo.

View Paper