Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation
Stefan Vasilev, Christian Herold, Baohao Liao, Seyyed Hadi Hashemi, Shahram Khadivi, Christof Monz
2025-05-16
Summary
This paper talks about Unilogit, a new technique that helps large language models forget specific information when needed, without messing up the rest of what they know.
What's the problem?
The problem is that sometimes, AI models need to 'unlearn' or forget certain things, like outdated facts or private data, but most current methods either don't work well or end up making the model less useful overall.
What's the solution?
The researchers created Unilogit, which uses a smart way of adjusting how the model makes decisions so it can forget targeted information while keeping its other abilities strong. This method, called uniform-target self-distillation, helps the model selectively forget things more reliably than previous approaches.
Why it matters?
This matters because it makes AI models safer and more flexible, allowing them to remove sensitive or wrong information when needed, while still being helpful and accurate for everything else.
Abstract
Unilogit dynamically adjusts logits to enable selective forgetting in Large Language Models while maintaining overall utility and outperforming existing methods.