Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images
Boammani Aser Lompo, Marc Haraoui
2025-09-15
Summary
This paper introduces a new dataset called Visual-TableQA, designed to test how well artificial intelligence can understand and reason about information presented in tables, specifically tables that look like images. It aims to push the boundaries of what current AI models can do with visual and textual data combined.
What's the problem?
Existing datasets used to evaluate AI’s ability to understand tables aren’t large enough, diverse enough, or challenging enough, especially when the tables are presented as images rather than simple text. This makes it hard to accurately measure and improve AI’s reasoning skills when dealing with real-world table data.
What's the solution?
The researchers created Visual-TableQA using a clever system where multiple AI language models work together. One model generates questions and tables, another validates the quality, and a third provides inspiration for new ideas. This process created a dataset of 2,500 tables and 6,000 question-answer pairs, all for a very low cost. The key is that the models collaborate, with stronger models guiding weaker ones to create a wide variety of table structures and reasoning challenges.
Why it matters?
This dataset is important because it provides a more realistic and difficult test for AI models. Models trained on Visual-TableQA actually perform better on other table-understanding tasks, even compared to some commercially developed models. This shows that the dataset is effective at improving AI’s ability to reason about visual information in tables, which is a crucial skill for many real-world applications.
Abstract
Visual reasoning over structured data such as tables is a critical capability for modern vision-language models (VLMs), yet current benchmarks remain limited in scale, diversity, or reasoning depth, especially when it comes to rendered table images. Addressing this gap, we introduce Visual-TableQA, a large-scale, open-domain multimodal dataset specifically designed to evaluate and enhance visual reasoning over complex tabular data. Our generation pipeline is modular, scalable, and fully autonomous, involving multiple reasoning LLMs collaborating across distinct roles: generation, validation, and inspiration. Visual-TableQA comprises 2.5k richly structured LaTeX-rendered tables and 6k reasoning-intensive QA pairs, all produced at a cost of under USD 100. To promote diversity and creativity, our pipeline performs multi-model collaborative data generation via cross-model prompting ('inspiration') and LLM-jury filtering. Stronger models seed layouts and topics that weaker models elaborate, collectively distilling diverse reasoning patterns and visual structures into the dataset. Empirical results show that models fine-tuned on Visual-TableQA generalize robustly to external benchmarks, outperforming several proprietary models despite the dataset's synthetic nature. The full pipeline and resources are publicly available at https://github.com/AI-4-Everyone/Visual-TableQA.