Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits

Tiantian Feng, Jihwan Lee, Anfeng Xu, Yoonjeong Lee, Thanathai Lertpetchpun, Xuan Shi, Helin Wang, Thomas Thebaud, Laureano Moro-Velazquez, Dani Byrd, Najim Dehak, Shrikanth Narayanan

2025-05-21

Summary

This paper talks about Vox-Profile, a new tool for testing how well AI systems can understand and describe different qualities of people's voices and speech patterns.

What's the problem?

The problem is that most current methods for evaluating speech AI, like those used in voice assistants or speech-to-text apps, don't do a good job of measuring all the different ways people can sound, such as accent, emotion, or speaking style.

What's the solution?

To solve this, the researchers created Vox-Profile, a benchmark that looks at many different traits of speakers and their speech. This helps test and improve AI systems so they can recognize and work with a wider variety of voices and speaking styles.

Why it matters?

This matters because it helps make speech technology more accurate and fair for everyone, no matter how they sound, which is important for things like accessibility, communication, and making technology more inclusive.

Abstract

Vox-Profile is a benchmark for evaluating multi-dimensional speaker and speech traits using speech foundation models, offering applications in ASR and speech generation performance analysis.

View Paper