ZeroGUI: Automating Online GUI Learning at Zero Human Cost
Chenyu Yang, Shiqian Su, Shi Liu, Xuan Dong, Yue Yu, Weijie Su, Xuehui Wang, Zhaoyang Liu, Jinguo Zhu, Hao Li, Wenhai Wang, Yu Qiao, Xizhou Zhu, Jifeng Dai
2025-05-30
Summary
This paper talks about ZeroGUI, a system that teaches computer programs to understand and interact with online interfaces automatically, without needing people to help much.
What's the problem?
The problem is that training AI to use websites or apps by learning how to click buttons and fill forms usually requires a lot of human effort and time to guide the learning process.
What's the solution?
The researchers created ZeroGUI, which uses AI models that understand both images and language to create tasks and judge how well the AI is doing. This way, the AI can learn to use online interfaces by itself, with almost no human help.
Why it matters?
This is important because it can make it much easier and faster to develop smart programs that can use websites or apps on their own, which could help automate many online tasks and improve how AI interacts with digital tools.
Abstract
ZeroGUI is an online learning framework that uses Vision-Language Models for task generation and reward estimation, enhancing GUI Agents' performance with minimal human intervention.