MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs

Jiakang Yuan, Tianshuo Peng, Yilei Jiang, Yiting Lu, Renrui Zhang, Kaituo Feng, Chaoyou Fu, Tao Chen, Lei Bai, Bo Zhang, Xiangyu Yue

2025-05-28

MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs

Summary

This paper talks about a new way to test how well big AI models that use both text and images can think logically and solve problems.

What's the problem?

The problem is that even though these AI models are really good at understanding language and pictures, it's not clear how well they can actually reason and make logical decisions, especially when it comes to different types of thinking like figuring out patterns, drawing conclusions, or coming up with the best explanation.

What's the solution?

The researchers created a special test, called MME-Reasoning, that checks how these AI models handle different kinds of logical reasoning, such as inductive, deductive, and abductive reasoning. They used this test to see where the models do well and where they struggle.

Why it matters?

This matters because if we want to trust AI to help us make decisions or solve complex problems, we need to know how good they really are at thinking logically, not just understanding words and images. This research helps us understand the strengths and weaknesses of these models so we can make them better in the future.

Abstract

MME-Reasoning evaluates the logical reasoning capabilities of multimodal large language models, revealing significant limitations and performance imbalances across inductive, deductive, and abductive reasoning types.

View Paper