ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models

Liyan Tang, Grace Kim, Xinyu Zhao, Thom Lake, Wenxuan Ding, Fangcong Yin, Prasann Singhal, Manya Wadhwa, Zeyu Leo Liu, Zayne Sprague, Ramya Namuduri, Bodun Hu, Juan Diego Rodriguez, Puyuan Peng, Greg Durrett

2025-05-20

ChartMuseum: Testing Visual Reasoning Capabilities of Large
Vision-Language Models

Summary

This paper talks about ChartMuseum, a new test designed to see how well AI models that understand both images and text can answer questions about charts and graphs.

What's the problem?

The problem is that even the smartest AI models still struggle to answer questions about complicated charts, especially when compared to how well humans do on the same tasks.

What's the solution?

To show this, the researchers created a special benchmark called ChartMuseum, which includes lots of tough questions about different kinds of charts. They then tested popular AI models and found that these models often make mistakes on visually complex questions.

Why it matters?

This matters because it shows that there's still a big gap between what AI can do and what humans can do when it comes to understanding visual information, which is important for improving AI systems that need to work with data, graphs, and visual reasoning in the real world.

Abstract

A new benchmark, ChartMuseum, highlights the underperformance of large vision-language models in chart question answering, particularly for visually complex questions, compared to human accuracy.

View Paper