DocDancer: Towards Agentic Document-Grounded Information Seeking

Qintong Zhang, Xinjie Lv, Jialong Wu, Baixuan Li, Zhengwei Tao, Guochen Yan, Huanyao Zhang, Bin Wang, Jiahao Xu, Haitao Mi, Wentao Zhang

2026-01-09

DocDancer: Towards Agentic Document-Grounded Information Seeking

Summary

This paper introduces DocDancer, a new open-source computer program designed to answer questions based on information found in documents. It tackles the challenge of building a DocQA system that can effectively find and understand relevant information within those documents.

What's the problem?

Current systems that answer questions from documents often struggle to effectively use different tools to find the answers and frequently rely on models that aren't publicly available. This means it's hard for researchers to build upon existing work and create better systems. Also, there isn't a lot of good training data specifically designed to teach these systems how to explore documents and then put the information together to answer questions.

What's the solution?

The researchers created DocDancer, which works by first exploring the document to find relevant information, and then synthesizing that information to form an answer. To overcome the lack of training data, they developed a method to automatically create realistic training examples. They then trained DocDancer on this new data and tested it on two challenging document understanding tasks, showing it performs well.

Why it matters?

This work is important because it provides an open-source solution for document question answering, allowing anyone to use and improve upon it. It also demonstrates a new way to generate training data for these types of systems, which could lead to more accurate and reliable DocQA agents in the future. This is useful for things like quickly finding information in legal documents, research papers, or long reports.

Abstract

Document Question Answering (DocQA) focuses on answering questions grounded in given documents, yet existing DocQA agents lack effective tool utilization and largely rely on closed-source models. In this work, we introduce DocDancer, an end-to-end trained open-source Doc agent. We formulate DocQA as an information-seeking problem and propose a tool-driven agent framework that explicitly models document exploration and comprehension. To enable end-to-end training of such agents, we introduce an Exploration-then-Synthesis data synthesis pipeline that addresses the scarcity of high-quality training data for DocQA. Training on the synthesized data, the trained models on two long-context document understanding benchmarks, MMLongBench-Doc and DocBench, show their effectiveness. Further analysis provides valuable insights for the agentic tool design and synthetic data.

View Paper