JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation

Shota Onohara, Atsuyuki Miyai, Yuki Imajuku, Kazuki Egashira, Jeonghun Baek, Xiang Yue, Graham Neubig, Kiyoharu Aizawa

2024-10-23

JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation

Summary

This paper introduces JMMMU, a new benchmark designed to evaluate large multimodal models (LMMs) specifically for the Japanese language and culture, helping researchers understand how well these models perform on tasks that require cultural knowledge.

What's the problem?

Many existing AI models are primarily trained on English and may not perform well when used with non-English languages like Japanese. They often lack the necessary understanding of cultural context, which can lead to poor performance on tasks that require specific cultural knowledge.

What's the solution?

The authors created JMMMU, which includes two parts: a culture-agnostic subset that translates general subjects into Japanese for comparison with English models, and a culture-specific subset that features questions reflecting Japanese culture. This allows for a comprehensive evaluation of how well LMMs understand both the language and the cultural context.

Why it matters?

This work is important because it helps improve AI models for Japanese speakers by providing a way to assess their performance in culturally relevant tasks. By advancing LMMs in non-English languages, it enhances user experiences for a broader audience and sets a standard for creating culturally aware benchmarks in AI development.

Abstract

Accelerating research on Large Multimodal Models (LMMs) in non-English languages is crucial for enhancing user experiences across broader populations. In this paper, we introduce JMMMU (Japanese MMMU), the first large-scale Japanese benchmark designed to evaluate LMMs on expert-level tasks based on the Japanese cultural context. To facilitate comprehensive culture-aware evaluation, JMMMU features two complementary subsets: (i) culture-agnostic (CA) subset, where the culture-independent subjects (e.g., Math) are selected and translated into Japanese, enabling one-to-one comparison with its English counterpart MMMU; and (ii) culture-specific (CS) subset, comprising newly crafted subjects that reflect Japanese cultural context. Using the CA subset, we observe performance drop in many LMMs when evaluated in Japanese, which is purely attributable to language variation. Using the CS subset, we reveal their inadequate Japanese cultural understanding. Further, by combining both subsets, we identify that some LMMs perform well on the CA subset but not on the CS subset, exposing a shallow understanding of the Japanese language that lacks depth in cultural understanding. We hope this work will not only help advance LMM performance in Japanese but also serve as a guideline to create high-standard, culturally diverse benchmarks for multilingual LMM development. The project page is https://mmmu-japanese-benchmark.github.io/JMMMU/.

View Paper