BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset

Zhiheng Xi, Guanyu Li, Yutao Fan, Honglin Guo, Yufang Liu, Xiaoran Fan, Jiaqi Liu, Jingchao Ding, Wangmeng Zuo, Zhenfei Yin, Lei Bai, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang

2025-07-08

BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning
Dataset

Summary

This paper talks about BMMR, a large and diverse dataset made to help improve and test large multimodal AI models that can understand both text and images across many subjects. It includes over 110,000 college-level questions in English and Chinese, covering hundreds of topics and different question types.

What's the problem?

The problem is that existing datasets for testing AI models mostly focus on specific areas like math or only one language, which limits how well these models can be developed for broader and more complex reasoning abilities across different disciplines and languages.

What's the solution?

The researchers collected and cleaned data from books, exams, and quizzes using a careful process involving humans and machines to create high-quality questions paired with detailed reasoning paths. They organized the data into two parts: one for evaluation with high-quality questions, and another for training. They also created a multi-discipline verifier to better check how well models reason through answers.

Why it matters?

This matters because BMMR allows researchers to build and test AI models that understand and reason better in many different fields and languages. This helps make AI more useful for global and multilingual applications involving complex knowledge and reasoning.

Abstract

A large-scale bilingual, multimodal, multi-disciplinary reasoning dataset (BMMR) is introduced to evaluate and develop large multimodal models (LMMs) across various disciplines and formats, with a focus on reasoning paths and discipline-specific performance.

View Paper