A survey proposes a systematic taxonomy for evaluating large audio-language models across dimensions including auditory awareness, knowledge reasoning, dialogue ability, and fairness, to address fragmented benchmarks in the field.

This paper talks about a survey that aims to create a complete and organized way to judge how well large audio-language models perform in different areas, like understanding sounds, reasoning, having conversations, and being fair.

Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey

Summary

What's the problem?

What's the solution?

Why it matters?

Abstract