CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following

Yinghao Ma, Siyou Li, Juntao Yu, Emmanouil Benetos, Akira Maezawa

2025-06-18

CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction
Following

Summary

This paper talks about CMI-Bench, a new way to test large language models that work with both audio and text, focusing on how well they can follow instructions related to music tasks.

What's the problem?

The problem is that there weren’t good, varied tests to see how well these audio-text models can understand and work with music-related information, which makes it hard to know how good they really are in music tasks.

What's the solution?

The researchers created CMI-Bench, which includes many different kinds of music tasks that require the model to follow instructions carefully. This lets them measure how well the models understand and respond to audio and text signals in the music domain.

Why it matters?

This matters because it helps improve AI that can work with music in smart ways, making it possible to build better tools for music analysis, recommendation, and creation by knowing exactly how well these models perform.

Abstract

CMI-Bench introduces a comprehensive instruction-following benchmark for audio-text LLMs to evaluate them on a diverse range of music information retrieval tasks.

View Paper