< Explain other AI papers

CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images

Chengqi Duan, Kaiyue Sun, Rongyao Fang, Manyuan Zhang, Yan Feng, Ying Luo, Yufang Liu, Ke Wang, Peng Pei, Xunliang Cai, Hongsheng Li, Yi Ma, Xihui Liu

2025-10-14

CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images

Summary

This paper introduces a new way for AI models to solve math problems that require looking at images, like graphs or diagrams. It focuses on letting the AI 'think' with images by actually creating and using plots as part of its reasoning process.

What's the problem?

Current AI models, even the really advanced ones like Large Language Models, struggle with math problems that need visual thinking. They're good at processing text, but when a problem requires drawing a line on a graph or interpreting a diagram, they often get stuck. Existing models that *can* handle images and text together aren't precise enough to reliably create helpful visuals for solving these problems.

What's the solution?

The researchers developed a system called CodePlot-CoT. This system doesn't just give a text answer; it generates code that creates plots and diagrams – essentially, it 'shows its work' visually. They also created a huge dataset of math problems with visual components, called Math-VR, to train the AI. A key part was building a tool to accurately convert complex math figures into code that the AI could understand and use to generate images.

Why it matters?

This work is important because it pushes AI closer to being able to solve math problems the way humans do – by using both logic and visual reasoning. The new dataset and approach provide a foundation for future research in this area, and the publicly available resources will help other researchers build on this work to create even more powerful AI problem-solvers.

Abstract

Recent advances in Large Language Models (LLMs) and Vision Language Models (VLMs) have shown significant progress in mathematical reasoning, yet they still face a critical bottleneck with problems requiring visual assistance, such as drawing auxiliary lines or plotting functions to solve the problems. Most LLMs and VLMs are constrained to text-only reasoning chains, while multimodal unified models that can generate interleaved text and images lack the necessary precision and controllability for such tasks. To address this, we propose CodePlot-CoT, a code-driven Chain-of-Thought paradigm for "thinking with images" in mathematics. Our approach leverages the VLM to generate text reasoning as well as executable plotting code, which is then rendered into images as "visual thought", to solve mathematical problems. To achieve this, we first construct Math-VR, the first large-scale, bilingual dataset and benchmark for Mathematics problems with Visual Reasoning, comprising 178K samples. Second, to create high-quality training data, we develop a state-of-the-art image-to-code converter specialized for parsing complex mathematical figures into codes. Finally, using these training data, we train the CodePlot-CoT model for solving mathematical problems. Experimental results show that our model achieves up to 21% increase over base model on our new benchmark, fully validating the efficacy of our proposed code-driven reasoning paradigm. Our work opens a new direction for multimodal mathematical reasoning and provides the community with the first large-scale dataset, comprehensive benchmark, and strong approach for such problems. To facilitate future research, we make our datasets, code, and pretrained models publicly available at https://github.com/HKU-MMLab/Math-VR-CodePlot-CoT.