LLM Unlearning Should Be Form-Independent
Xiaotian Ye, Mengqi Zhang, Shu Wu
2025-06-15
Summary
This paper talks about how to make large language models (LLMs) forget or unlearn certain knowledge in a way that works no matter how that knowledge is asked about or expressed. It introduces a method called Rank-one Concept Redirection (ROCR) that changes the model's internal understanding so it stops recalling specific harmful or private information by redirecting it to safer ideas.
What's the problem?
The problem is that current methods to make LLMs unlearn certain information only work well when the knowledge is asked about in the exact same way as in the training data. When the same knowledge appears in different forms or expressions, these methods often fail, which limits their usefulness for controlling unwanted or dangerous information.
What's the solution?
The solution is ROCR, a new approach that works without retraining the model. Instead, it quickly updates specific parts of the model’s internal layers to redirect the concept that needs to be forgotten to another harmless concept. This way, the model doesn’t remember the original information even when asked about it in different ways, making the unlearning form-independent and more effective.
Why it matters?
This matters because in real-world applications, people might ask about the same harmful or private information in many different ways, so unlearning that only works for one form isn’t enough. ROCR’s form-independent method improves the safety and control over what LLMs remember, helping prevent misuse and protecting privacy while keeping the model's other knowledge intact.
Abstract
Form-Dependent Bias limits the effectiveness of LLM unlearning across different knowledge expressions, and Rank-one Concept Redirection (ROCR) is proposed as a form-independent solution that enhances unlearning efficacy.