ChartM^3: A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension

Duo Xu, Hao Cheng, Xin Lin, Zhen Xie, Hao Wang

2025-11-05

ChartM^3: A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension

Summary

This research focuses on improving how well computer programs, specifically those combining image and language understanding, can interpret complex charts and graphs.

What's the problem?

Current computer programs struggle with understanding complicated charts that require multiple steps of reasoning and calculations, like those found in real-world data analysis. Existing datasets don't fully cover these challenging scenarios, limiting the ability to train programs to handle them effectively. It's hard to create enough realistic and diverse chart-based questions for these programs to learn from.

What's the solution?

The researchers developed a system that automatically creates datasets of charts and questions. It works in stages: first, it finds existing chart templates, then it uses a technique called 'chain-of-thought' to generate code that simulates how a person would analyze the data and answer questions about the chart. This code actually creates the data within the chart and performs the necessary calculations. They created a large dataset called ChartM^3 with over 38,000 charts and 142,000 questions, and a smaller set for testing.

Why it matters?

This work is important because it allows for the creation of better training data for AI programs that need to understand charts. By improving this understanding, even smaller AI models can perform as well as much larger ones on complex chart analysis tasks, making data analysis more accessible and efficient.

Abstract

Complex chart understanding tasks demand advanced visual recognition and reasoning capabilities from multimodal large language models (MLLMs). However, current research provides limited coverage of complex chart scenarios and computation-intensive reasoning tasks prevalent in real-world applications. This study proposes an automated multi-stage code-driven pipeline for systematically generating visual reasoning datasets to address these limitations. The pipeline integrates retrieval-augmented generation (RAG) to retrieve professional chart templates and employs chain-of-thought (CoT) strategies to generate reasoning codes that simulate real data distributions, thereby driving chart rendering and question-related statistical computations. Through model-based evaluation, the pipeline enhances chart diversity and data quality. Using this framework, we construct ChartM^3, a multi-dimensional and multi-step dataset containing 38K charts and 142K Q&A pairs for training, along with 2,871 high-quality evaluation samples for enabling practical performance assessment. Supervised fine-tuning (SFT) and reinforcement learning (RL) experiments demonstrate that our dataset significantly improves reasoning capabilities and cross-domain generalization performance, enabling smaller models to achieve performance comparable to larger-scale models in complex chart comprehension.

View Paper