ChartAB: A Benchmark for Chart Grounding & Dense Alignment
Aniruddh Bansal, Davit Soselia, Dang Nguyen, Tianyi Zhou
2025-10-31
Summary
This paper focuses on how well computers can 'understand' charts and graphs, specifically how they can pull out details and compare information presented visually. It introduces a new way to test these computer systems, called ChartAlign Benchmark (ChartAB).
What's the problem?
Current computer systems that combine vision (seeing images) and language (understanding text) aren't very good at accurately reading charts. They struggle with noticing small details, figuring out the structure of the chart, and comparing different charts to each other. This makes it hard for them to actually *use* the information in charts for things like data analysis or making decisions.
What's the solution?
The researchers created a new benchmark, ChartAB, which is a set of tests designed to specifically challenge computers on these chart-understanding skills. They also developed a way to measure how well the computers do on these tests, focusing on things like identifying data points, locating parts of the chart, and recognizing different features. The benchmark also includes a two-step process to see if the computers can correctly match up information *between* two different charts.
Why it matters?
This work is important because it shows where current computer systems fall short when it comes to understanding charts. By pinpointing these weaknesses, it helps researchers improve these systems so they can better analyze data and assist humans in tasks that require visual information. It highlights the specific skills these systems need to develop to truly 'understand' what charts are showing.
Abstract
Charts play an important role in visualization, reasoning, data analysis, and the exchange of ideas among humans. However, existing vision-language models (VLMs) still lack accurate perception of details and struggle to extract fine-grained structures from charts. Such limitations in chart grounding also hinder their ability to compare multiple charts and reason over them. In this paper, we introduce a novel "ChartAlign Benchmark (ChartAB)" to provide a comprehensive evaluation of VLMs in chart grounding tasks, i.e., extracting tabular data, localizing visualization elements, and recognizing various attributes from charts of diverse types and complexities. We design a JSON template to facilitate the calculation of evaluation metrics specifically tailored for each grounding task. By incorporating a novel two-stage inference workflow, the benchmark can further evaluate VLMs' capability to align and compare elements/attributes across two charts. Our analysis of evaluations on several recent VLMs reveals new insights into their perception biases, weaknesses, robustness, and hallucinations in chart understanding. These findings highlight the fine-grained discrepancies among VLMs in chart understanding tasks and point to specific skills that need to be strengthened in current models.