VLSBench: Unveiling Visual Leakage in Multimodal Safety

Xuhao Hu, Dongrui Liu, Hao Li, Xuanjing Huang, Jing Shao

2024-12-03

VLSBench: Unveiling Visual Leakage in Multimodal Safety

Summary

This paper presents INCLUDE, a new evaluation tool designed to assess how well multilingual language models understand and generate language in various regional contexts.

What's the problem?

Many large language models (LLMs) perform differently depending on the language, which can limit their effectiveness in different regions. A major issue is that most evaluation resources focus on English and do not consider the cultural and regional knowledge necessary for understanding other languages. This lack of high-quality evaluation tools makes it difficult to develop effective multilingual models that can serve diverse communities.

What's the solution?

INCLUDE addresses this problem by creating a comprehensive benchmark that includes 197,243 question-and-answer pairs from local exams in 44 different languages. These questions come from various sources, such as academic tests and professional certification exams, ensuring that they reflect the regional knowledge and cultural context needed for effective language understanding. This evaluation suite allows researchers to measure the performance of multilingual LLMs in realistic scenarios where they would be used.

Why it matters?

This research is significant because it helps improve the development of multilingual AI tools, making them more useful and relevant for people in different regions. By providing a way to evaluate how well these models understand various languages and cultures, INCLUDE can enhance the effectiveness of AI technologies in education, business, and everyday life, ultimately bridging the gap between different language communities.

Abstract

Safety concerns of Multimodal large language models (MLLMs) have gradually become an important problem in various applications. Surprisingly, previous works indicate a counter-intuitive phenomenon that using textual unlearning to align MLLMs achieves comparable safety performances with MLLMs trained with image-text pairs. To explain such a counter-intuitive phenomenon, we discover a visual safety information leakage (VSIL) problem in existing multimodal safety benchmarks, i.e., the potentially risky and sensitive content in the image has been revealed in the textual query. In this way, MLLMs can easily refuse these sensitive text-image queries according to textual queries. However, image-text pairs without VSIL are common in real-world scenarios and are overlooked by existing multimodal safety benchmarks. To this end, we construct multimodal visual leakless safety benchmark (VLSBench) preventing visual safety leakage from image to textual query with 2.4k image-text pairs. Experimental results indicate that VLSBench poses a significant challenge to both open-source and close-source MLLMs, including LLaVA, Qwen2-VL, Llama3.2-Vision, and GPT-4o. This study demonstrates that textual alignment is enough for multimodal safety scenarios with VSIL, while multimodal alignment is a more promising solution for multimodal safety scenarios without VSIL. Please see our code and data at: http://hxhcreate.github.io/VLSBench

View Paper