Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models

Zirui Song, Qian Jiang, Mingxuan Cui, Mingzhe Li, Lang Gao, Zeyu Zhang, Zixiang Xu, Yanbo Wang, Chenxi Wang, Guangxian Ouyang, Zhenhao Chen, Xiuying Chen

2025-05-22

Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large
Audio-Language Models

Summary

This paper talks about AJailBench, a new tool that tests how easy it is to trick large audio-language models into breaking their safety rules by using sneaky audio prompts.

What's the problem?

Large audio-language models, which understand and respond to spoken language, can sometimes be fooled into giving harmful or unsafe answers if someone uses cleverly designed audio tricks, but there hasn't been a good way to measure how vulnerable these models are.

What's the solution?

The researchers created AJailBench, a benchmark that uses a collection of tricky audio prompts to test these models and found that many of them have serious safety weaknesses, especially when the audio is changed just enough to keep its meaning but avoid detection.

Why it matters?

This matters because it shows that current audio-language models aren't as safe as people might think, which is important for making voice assistants and other audio-based AI systems more secure and trustworthy.

Abstract

AJailBench evaluates jailbreak vulnerabilities in Large Audio Language Models using a dataset of adversarial audio prompts, revealing consistent safety weaknesses and the effectiveness of semantically preserved perturbations.

View Paper