AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models

Kai Li, Can Shen, Yile Liu, Jirui Han, Kelong Zheng, Xuechao Zou, Zhe Wang, Xingjian Du, Shun Zhang, Hanjun Luo, Yingbin Jin, Xinxin Xing, Ziyang Ma, Yue Liu, Xiaojun Jia, Yifan Zhang, Junfeng Fang, Kun Wang, Yibo Yan, Haoyang Li, Yiming Li, Xiaobin Zhuang

2025-05-26

AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large
Language Models

Summary

This paper talks about AudioTrust, a new way to test how much we can trust large language models that work with audio, like speech or sounds, by checking their performance in different real-life situations.

What's the problem?

The problem is that as these audio models get used more in important areas like voice assistants, customer service, and safety systems, we need to know if they are reliable, fair, and accurate. But until now, there hasn't been a thorough way to measure all the different ways these models might succeed or fail.

What's the solution?

The researchers created AudioTrust, which is a special set of tests and measurements designed to look at many sides of trustworthiness, such as accuracy, fairness, and reliability, using a large and varied dataset that represents real-world audio scenarios.

Why it matters?

This is important because it helps developers and users understand how much they can rely on these audio models, making sure they work well and safely in everyday situations where mistakes could have serious consequences.

Abstract

AudioTrust evaluates the trustworthiness of Audio Large Language Models across multifaceted dimensions, using a comprehensive dataset and specific metrics to assess their performance in real-world audio scenarios.

View Paper