Cross-Modality Safety Alignment

Siyin Wang, Xingsong Ye, Qinyuan Cheng, Junwen Duan, Shimin Li, Jinlan Fu, Xipeng Qiu, Xuanjing Huang

2024-06-26

Summary

This paper introduces a new challenge called Safe Inputs but Unsafe Output (SIUO) to evaluate how well artificial general intelligence (AGI) systems can ensure safety when they process information from different sources. It highlights the risks that can arise when combining safe inputs to produce unsafe or unethical outputs.

What's the problem?

As AGI becomes more integrated into everyday life, it's crucial to ensure that these systems operate safely and ethically. Most previous studies have only looked at the safety of individual types of data (single-modality), which doesn't address the potential dangers that can occur when different types of data are combined. This oversight can lead to situations where each type of data seems safe on its own, but together they could produce harmful outcomes.

What's the solution?

The authors created the SIUO benchmark to test how well AGI systems handle these cross-modality interactions. They focused on nine important safety areas, such as self-harm, illegal activities, and privacy violations. By examining how existing models, like GPT-4V and LLaVA, respond to these challenges, they found significant vulnerabilities in both closed-source and open-source models. This research aims to improve our understanding of how AGI can safely interpret and respond to complex real-world situations.

Why it matters?

This research is important because it addresses a critical gap in AI safety by considering how different types of information interact with each other. By developing a framework to evaluate these interactions, the study helps identify weaknesses in current AI models and promotes the creation of safer and more reliable AGI systems that can better protect users from potential harm.

Abstract

As Artificial General Intelligence (AGI) becomes increasingly integrated into various facets of human life, ensuring the safety and ethical alignment of such systems is paramount. Previous studies primarily focus on single-modality threats, which may not suffice given the integrated and complex nature of cross-modality interactions. We introduce a novel safety alignment challenge called Safe Inputs but Unsafe Output (SIUO) to evaluate cross-modality safety alignment. Specifically, it considers cases where single modalities are safe independently but could potentially lead to unsafe or unethical outputs when combined. To empirically investigate this problem, we developed the SIUO, a cross-modality benchmark encompassing 9 critical safety domains, such as self-harm, illegal activities, and privacy violations. Our findings reveal substantial safety vulnerabilities in both closed- and open-source LVLMs, such as GPT-4V and LLaVA, underscoring the inadequacy of current models to reliably interpret and respond to complex, real-world scenarios.

View Paper