VisCoder2: Building Multi-Language Visualization Coding Agents

Yuansheng Ni, Songcheng Cai, Xiangchao Chen, Jiarong Liang, Zhiheng Lyu, Jiaqi Deng, Kai Zou, Ping Nie, Fei Yuan, Xiang Yue, Wenhu Chen

2025-10-29

VisCoder2: Building Multi-Language Visualization Coding Agents

Summary

This paper focuses on improving computer programs that can automatically create visualizations, like charts and graphs, from instructions. These programs, called coding agents, are getting better thanks to advancements in large language models, but they still struggle with real-world tasks.

What's the problem?

Current coding agents aren't very reliable when it comes to making visualizations. They often have trouble understanding different programming languages, sometimes the code they generate doesn't actually work, and they aren't good at fixing their own mistakes. A big part of the problem is that the datasets used to train these agents are too small and only test them on simple, one-step tasks in a single language.

What's the solution?

The researchers created three things to help solve this. First, they built a huge dataset called VisCode-Multi-679K with over 679,000 examples of visualization code in 12 different languages, including conversations showing how to correct errors. Second, they designed a benchmark called VisPlotBench to thoroughly test these agents, checking if the generated code runs and if the resulting visualization looks right. Finally, they developed a new family of models called VisCoder2, trained on their new dataset, that can handle multiple languages and improve through self-debugging.

Why it matters?

This work is important because it significantly improves the ability of computers to automatically create visualizations. VisCoder2 performs much better than existing open-source options and gets close to the performance of advanced, but closed-source, models like GPT-4. This means we're closer to having tools that can easily turn data into understandable visuals, which is crucial for many fields like science, business, and education.

Abstract

Large language models (LLMs) have recently enabled coding agents capable of generating, executing, and revising visualization code. However, existing models often fail in practical workflows due to limited language coverage, unreliable execution, and lack of iterative correction mechanisms. Progress has been constrained by narrow datasets and benchmarks that emphasize single-round generation and single-language tasks. To address these challenges, we introduce three complementary resources for advancing visualization coding agents. VisCode-Multi-679K is a large-scale, supervised dataset containing 679K validated and executable visualization samples with multi-turn correction dialogues across 12 programming languages. VisPlotBench is a benchmark for systematic evaluation, featuring executable tasks, rendered outputs, and protocols for both initial generation and multi-round self-debug. Finally, we present VisCoder2, a family of multi-language visualization models trained on VisCode-Multi-679K. Experiments show that VisCoder2 significantly outperforms strong open-source baselines and approaches the performance of proprietary models like GPT-4.1, with further gains from iterative self-debug, reaching 82.4% overall execution pass rate at the 32B scale, particularly in symbolic or compiler-dependent languages.

View Paper