Visual Programmability: A Guide for Code-as-Thought in Chart Understanding
Bohao Tang, Yan Ma, Fei Zhang, Jiadi Su, Ethan Chern, Zhulin Hu, Zhixin Wang, Pengfei Liu, Ya Zhang
2025-09-12
Summary
This paper explores how to make Vision-Language Models, which are AI systems that can understand both images and text, better at understanding charts and graphs.
What's the problem?
Current methods for chart understanding have weaknesses. Some rely on pre-made tools which limits their flexibility, and others use a single way of thinking – like breaking down the problem into text steps – which can be hard to check for accuracy. It's difficult to 'reward' the AI for getting the right answer when the reasoning process is unclear, making it hard to improve the system.
What's the solution?
The researchers came up with a method called 'Code-as-Thought' where the AI translates the visual information in a chart into a symbolic code. However, they found that *always* using code doesn't work for all charts. So, they introduced 'Visual Programmability,' which allows the AI to *learn* when to use code and when to directly analyze the chart visually. The AI is trained using a special reward system that encourages both accurate answers and smart decision-making about which approach to use.
Why it matters?
This work is important because it shows that AI can be taught not just *what* to think, but *how* to think. By learning to choose the best reasoning strategy for each chart, these models become more reliable and capable of handling complex visual information, which is a step towards more intelligent AI systems.
Abstract
Chart understanding presents a critical test to the reasoning capabilities of Vision-Language Models (VLMs). Prior approaches face critical limitations: some rely on external tools, making them brittle and constrained by a predefined toolkit, while others fine-tune specialist models that often adopt a single reasoning strategy, such as text-based chain-of-thought (CoT). The intermediate steps of text-based reasoning are difficult to verify, which complicates the use of reinforcement-learning signals that reward factual accuracy. To address this, we propose a Code-as-Thought (CaT) approach to represent the visual information of a chart in a verifiable, symbolic format. Our key insight is that this strategy must be adaptive: a fixed, code-only implementation consistently fails on complex charts where symbolic representation is unsuitable. This finding leads us to introduce Visual Programmability: a learnable property that determines if a chart-question pair is better solved with code or direct visual analysis. We implement this concept in an adaptive framework where a VLM learns to choose between the CaT pathway and a direct visual reasoning pathway. The selection policy of the model is trained with reinforcement learning using a novel dual-reward system. This system combines a data-accuracy reward to ground the model in facts and prevent numerical hallucination, with a decision reward that teaches the model when to use each strategy, preventing it from defaulting to a single reasoning mode. Experiments demonstrate strong and robust performance across diverse chart-understanding benchmarks. Our work shows that VLMs can be taught not only to reason but also how to reason, dynamically selecting the optimal reasoning pathway for each task.