VeriGUI: Verifiable Long-Chain GUI Dataset
Shunyu Liu, Minghao Liu, Huichi Zhou, Zhenyu Cui, Yang Zhou, Yuhao Zhou, Wendong Fan, Ge Zhang, Jiajun Shi, Weihao Xuan, Jiaxing Huang, Shuang Luo, Fang Wu, Heli Qi, Qingcheng Zeng, Ziqi Ren, Jialiang Gao, Jindi Lv, Junjie Wang, Aosong Feng, Heng Zhou, Wangchunshu Zhou
2025-08-07
Summary
This paper talks about VeriGUI, a new dataset created to help test and improve computer programs called GUI agents that interact with software interfaces. The dataset includes long and complex tasks made up of many smaller steps, which can all be checked to make sure they are done correctly.
What's the problem?
The problem is that current GUI agents mostly handle short, simple tasks and are only checked at the final result. This limits their ability to work on real-world tasks that require many steps and careful planning. Without verifying each step, agents can make mistakes that are hard to detect until the task is finished.
What's the solution?
To solve this, VeriGUI provides a dataset with detailed task sequences split into smaller, verifiable actions. This means agents can be tested on each step of the process, not just the final outcome. The dataset includes tasks from both desktop and web environments and was created with the help of human experts to ensure accuracy.
Why it matters?
This matters because improving GUI agents to handle long and complex tasks can make software easier to use and automate many complicated processes. By focusing on verifiability at every step, VeriGUI pushes researchers to build smarter and more reliable agents that better understand and manage multi-step tasks in real computer environments.
Abstract
VeriGUI is a novel dataset for evaluating GUI agents in long-horizon tasks, emphasizing long-chain complexity and subtask-level verifiability.