MSTS: A Multimodal Safety Test Suite for Vision-Language Models
Paul Röttger, Giuseppe Attanasio, Felix Friedrich, Janis Goldzycher, Alicia Parrish, Rishabh Bhardwaj, Chiara Di Bonaventura, Roman Eng, Gaia El Khoury Geagea, Sujata Goswami, Jieun Han, Dirk Hovy, Seogyeong Jeong, Paloma Jeretič, Flor Miriam Plaza-del-Arco, Donya Rooein, Patrick Schramowski, Anastassia Shaitarova, Xudong Shen, Richard Willats, Andrea Zugarini, Bertie Vidgen
2025-01-22

Summary
This paper talks about a new way to test if AI systems that understand both images and text (called Vision-Language Models or VLMs) are safe to use. The researchers created a special test called MSTS to check if these AI systems might give dangerous advice when shown certain combinations of pictures and words.
What's the problem?
VLMs are becoming more common in things like chat assistants, but they might accidentally give harmful advice or encourage unsafe behavior if they're not properly designed. Until now, there hasn't been a good way to test if these AI systems are safe, especially when it comes to understanding both images and text together.
What's the solution?
The researchers created MSTS, which stands for Multimodal Safety Test Suite. It's like a big quiz for AI with 400 questions that combine pictures and text in tricky ways. These questions cover 40 different types of potential dangers. They used MSTS to test several VLMs and found that many of them had safety problems. They also translated the test into ten different languages and found that the AI systems were more likely to give unsafe answers in languages other than English.
Why it matters?
This matters because as AI systems that can understand both images and text become more common in our everyday lives, we need to make sure they're safe to use. MSTS gives developers a way to check their AI for potential dangers before releasing it to the public. It also shows that current safety measures might not be good enough, especially when it comes to different languages or when combining images and text. This research could help make AI assistants and other applications safer and more trustworthy for everyone to use.
Abstract
Vision-language models (VLMs), which process image and text inputs, are increasingly integrated into chat assistants and other consumer AI applications. Without proper safeguards, however, VLMs may give harmful advice (e.g. how to self-harm) or encourage unsafe behaviours (e.g. to consume drugs). Despite these clear hazards, little work so far has evaluated VLM safety and the novel risks created by multimodal inputs. To address this gap, we introduce MSTS, a Multimodal Safety Test Suite for VLMs. MSTS comprises 400 test prompts across 40 fine-grained hazard categories. Each test prompt consists of a text and an image that only in combination reveal their full unsafe meaning. With MSTS, we find clear safety issues in several open VLMs. We also find some VLMs to be safe by accident, meaning that they are safe because they fail to understand even simple test prompts. We translate MSTS into ten languages, showing non-English prompts to increase the rate of unsafe model responses. We also show models to be safer when tested with text only rather than multimodal prompts. Finally, we explore the automation of VLM safety assessments, finding even the best safety classifiers to be lacking.