MusiXQA: Advancing Visual Music Understanding in Multimodal Large Language Models

Jian Chen, Wenye Ma, Penghang Liu, Wei Wang, Tengwei Song, Ming Li, Chenguang Wang, Ruiyi Zhang, Changyou Chen

2025-07-02

MusiXQA: Advancing Visual Music Understanding in Multimodal Large
Language Models

Summary

This paper talks about MusiXQA, a new dataset created to help AI models understand music sheets better. It includes images of music sheets and detailed questions and answers about musical notes, chords, and other musical components to train and test AI models.

What's the problem?

The problem is that while AI models have gotten good at understanding pictures and text, they struggle to interpret music sheets correctly. Music sheets have unique symbols and structures that are hard for existing models to read and understand.

What's the solution?

The researchers created MusiXQA, a large and detailed dataset with synthetic music sheets made using a special music typesetting tool. They also developed Phi-3-MusiX, an AI model fine-tuned on this dataset, which performs much better than previous models in answering questions about music sheets.

Why it matters?

This matters because improving AI's ability to understand music sheets can help in music education, digital music production, and preserving musical knowledge by enabling smarter tools that can read and analyze music automatically.

Abstract

MusiXQA, a new dataset for evaluating MLLMs on music sheet understanding, reveals limitations and enables the development of Phi-3-MusiX, an improved MLLM for this task.

View Paper