DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios

Jinxiang Meng, Shaoping Huang, Fangyu Lei, Jingyu Guo, Haoxiang Liu, Jiahao Su, Sihan Wang, Yao Wang, Enrui Wang, Ye Yang, Hongze Chai, Jinming Lv, Anbang Yu, Huangjing Zhang, Yitong Zhang, Yiming Huang, Zeyao Ma, Shizhu He, Jun Zhao, Kang Liu

2026-04-29

DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios

Summary

This paper introduces DV-World, a new way to test how well computer programs can create and understand data visualizations like charts and dashboards, mimicking real-world tasks a data analyst might do.

What's the problem?

Current tests for data visualization programs are too simple and don't reflect the challenges of working with real data and changing requirements. They often limit programs to running in isolated environments, only allow creation from scratch, and assume the program perfectly understands what the user wants. This means programs that do well on these tests might still struggle in a professional setting.

What's the solution?

The researchers created DV-World, a set of 260 tasks divided into three areas: manipulating spreadsheets and fixing charts (DV-Sheet), adapting existing visualizations to new data using different programming methods (DV-Evolution), and understanding unclear requests from a simulated user (DV-Interact). They also developed a way to automatically check if the visualizations are numerically correct and if they make sense visually, using both direct value comparison and having another AI judge the results based on specific guidelines.

Why it matters?

DV-World is important because it provides a more realistic and challenging test for data visualization programs. The results show that even the best current programs aren't very good at handling these complex tasks, highlighting areas where further development is needed to create tools that can truly assist data analysts in real-world workflows.

Abstract

Real-world data visualization (DV) requires native environmental grounding, cross-platform evolution, and proactive intent alignment. Yet, existing benchmarks often suffer from code-sandbox confinement, single-language creation-only tasks, and assumption of perfect intent. To bridge these gaps, we introduce DV-World, a benchmark of 260 tasks designed to evaluate DV agents across real-world professional lifecycles. DV-World spans three domains: DV-Sheet for native spreadsheet manipulation including chart and dashboard creation as well as diagnostic repair; DV-Evolution for adapting and restructuring reference visual artifacts to fit new data across diverse programming paradigms and DV-Interact for proactive intent alignment with a user simulator that mimics real-world ambiguous requirements. Our hybrid evaluation framework integrates Table-value Alignment for numerical precision and MLLM-as-a-Judge with rubrics for semantic-visual assessment. Experiments reveal that state-of-the-art models achieve less than 50% overall performance, exposing critical deficits in handling the complex challenges of real-world data visualization. DV-World provides a realistic testbed to steer development toward the versatile expertise required in enterprise workflows. Our data and code are available at https://github.com/DA-Open/DV-World{this project page}.

View Paper