Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation

Vaidehi Patil, Yi-Lin Sung, Peter Hase, Jie Peng, Tianlong Chen, Mohit Bansal

2025-05-06

Unlearning Sensitive Information in Multimodal LLMs: Benchmark and
Attack-Defense Evaluation

Summary

This paper talks about UnLOK-VQA, a new way to test how well AI models can forget or erase sensitive information they've learned, especially when that information comes from both pictures and text.

What's the problem?

Sometimes AI models remember private or sensitive details that they shouldn't, and it's hard to make sure they can truly forget this information, especially when it comes from multiple sources like images and words.

What's the solution?

The researchers created a special benchmark to test and compare different methods for making AI forget specific information, and they found that certain attack methods are better at revealing what the model still remembers, showing it's important to fully remove sensitive answers from the model's memory.

Why it matters?

This matters because it helps keep personal and private data safe when using AI, making these models more trustworthy and secure for everyone.

Abstract

Multimodal unlearning benchmark UnLOK-VQA evaluates effective methods for targeted forgetting of specific multimodal knowledge from large language models, demonstrating multimodal attacks' superiority and the importance of removing answer information from model states.

View Paper