Contamination Detection for VLMs using Multi-Modal Semantic Perturbation

Jaden Park, Mu Cai, Feng Yao, Jingbo Shang, Soochahn Lee, Yong Jae Lee

2025-11-07

Contamination Detection for VLMs using Multi-Modal Semantic Perturbation

Summary

This paper investigates a problem with new AI models that can understand both images and text, called Vision-Language Models. These models are really good at tasks, but there's a worry that they're cheating by accidentally 'seeing' the test questions during their training.

What's the problem?

Vision-Language Models are trained on massive amounts of data from the internet. This data might accidentally include images and text that are *also* used to test the models. If this happens, the model isn't actually learning, it's just memorizing the answers, which gives a falsely high score. Previous attempts to fix this focused on cleaning the training data or changing the tests, but no one has really focused on *detecting* if a model is contaminated in the first place. The authors found that existing detection methods don't work well for these image-text models.

What's the solution?

The researchers intentionally added test data into the training sets of some open-source Vision-Language Models to simulate contamination. Then, they developed a new way to detect if a model is contaminated. Their method works by slightly changing the images or text and seeing if the model's answers change drastically. Contaminated models struggle with these small changes because they rely on exact memorization, while properly trained models can still understand the underlying concepts. They tested this method with different ways of contaminating the models and it worked consistently well.

Why it matters?

This research is important because it provides a way to check if Vision-Language Models are actually intelligent or just memorizing answers. This is crucial for building trustworthy AI systems. If we can reliably detect contamination, we can be more confident in the results these models produce and avoid using models that have been unfairly boosted by leaked test data. The tools they created will be available for others to use.

Abstract

Recent advances in Vision-Language Models (VLMs) have achieved state-of-the-art performance on numerous benchmark tasks. However, the use of internet-scale, often proprietary, pretraining corpora raises a critical concern for both practitioners and users: inflated performance due to test-set leakage. While prior works have proposed mitigation strategies such as decontamination of pretraining data and benchmark redesign for LLMs, the complementary direction of developing detection methods for contaminated VLMs remains underexplored. To address this gap, we deliberately contaminate open-source VLMs on popular benchmarks and show that existing detection approaches either fail outright or exhibit inconsistent behavior. We then propose a novel simple yet effective detection method based on multi-modal semantic perturbation, demonstrating that contaminated models fail to generalize under controlled perturbations. Finally, we validate our approach across multiple realistic contamination strategies, confirming its robustness and effectiveness. The code and perturbed dataset will be released publicly.

View Paper