WaterDrum: Watermarking for Data-centric Unlearning Metric

Xinyang Lu, Xinyuan Niu, Gregory Kang Ruey Lau, Bui Thi Cam Nhung, Rachael Hwee Ling Sim, Fanyu Wen, Chuan-Sheng Foo, See-Kiong Ng, Bryan Kian Hsiang Low

2025-05-09

WaterDrum: Watermarking for Data-centric Unlearning Metric

Summary

This paper talks about WaterDrum, a new way to measure how well large language models can 'unlearn' specific information by using a special kind of digital watermark in the text data.

What's the problem?

The problem is that when people want an AI model to forget certain information, it's hard to tell if the model has really unlearned it. Most current methods focus on how useful the model still is, but they don't actually check if the unwanted data is truly gone.

What's the solution?

The researchers created WaterDrum, which uses a strong watermark hidden in the data to track whether the model has actually forgotten the information it's supposed to unlearn. This gives a clearer and more direct way to measure unlearning than just looking at how the model performs overall.

Why it matters?

This matters because being able to make AI models forget certain information is important for privacy, security, and fairness. WaterDrum helps make sure that when data needs to be removed, it really is, which builds more trust in how AI is managed.

Abstract

WaterDrum introduces a data-centric unlearning metric for LLMs using robust text watermarking to address limitations in existing utility-centric metrics.

View Paper