Backdoor Cleaning without External Guidance in MLLM Fine-tuning
Xuankun Rong, Wenke Huang, Jian Liang, Jinhe Bi, Xun Xiao, Yiming Li, Bo Du, Mang Ye
2025-05-23
Summary
This paper talks about a new way to protect big AI models that work with both text and images from hidden attacks called backdoors, which can make the models behave badly if triggered.
What's the problem?
When these AI models are trained or fine-tuned, someone could sneak in special data that acts like a secret code, causing the model to give wrong or dangerous answers if it sees that code later. It's hard to catch these tricks, especially without extra information or changing the model.
What's the solution?
The researchers created a defense system called Believe Your Eyes (BYE), which checks how the model pays attention to different parts of its input and looks for unusual patterns that might mean a backdoor is present. This way, the system can filter out the bad data without needing extra labels or changing the model itself.
Why it matters?
This matters because it helps keep AI models safe and trustworthy, making sure they can't be easily tricked or used for harmful purposes, which is important as these models are used more in real life.
Abstract
A novel defense framework, Believe Your Eyes (BYE), identifies and filters backdoor samples in fine-tuned multimodal large language models by analyzing attention entropy patterns, preventing trigger activation without requiring additional labels or model changes.