ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

Zhaoyang Liu, JingJing Xie, Zichen Ding, Zehao Li, Bowen Yang, Zhenyu Wu, Xuehui Wang, Qiushi Sun, Shi Liu, Weiyun Wang, Shenglong Ye, Qingyun Li, Zeyue Tian, Gen Luo, Xiangyu Yue, Biqing Qi, Kai Chen, Bowen Zhou, Yu Qiao, Qifeng Chen, Wenhai Wang

2025-09-19

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

Summary

This paper introduces ScaleCUA, a new project focused on building better computer agents that can automatically use software and websites like a person would, by improving the amount of data available for training these agents.

What's the problem?

Currently, creating computer agents that can reliably use graphical user interfaces (GUIs) – things like clicking buttons and filling out forms – is difficult because there isn't a lot of publicly available data showing how people actually *use* computers across different operating systems and for different tasks. Without enough good examples, these agents don't learn to perform tasks well and aren't very adaptable.

What's the solution?

The researchers created ScaleCUA, a large dataset collected across six different operating systems and three types of tasks. They used a system where automated agents worked alongside human experts to generate this data. Then, they trained a new computer agent on this expanded dataset, allowing it to work more effectively across various platforms and tasks. This agent significantly outperformed previous models on several standard tests.

Why it matters?

This work is important because it shows that providing more data is a key way to improve the performance of computer agents. By releasing the dataset, models, and code, the researchers hope to encourage further development in this field, ultimately leading to more helpful and automated computer assistants.

Abstract

Vision-Language Models (VLMs) have enabled computer use agents (CUAs) that operate GUIs autonomously, showing great potential, yet progress is limited by the lack of large-scale, open-source computer use data and foundation models. In this work, we introduce ScaleCUA, a step toward scaling open-source CUAs. It offers a large-scale dataset spanning 6 operating systems and 3 task domains, built via a closed-loop pipeline uniting automated agents with human experts. Trained on this scaled-up data, ScaleCUA can operate seamlessly across platforms. Specifically, it delivers strong gains over baselines (+26.6 on WebArena-Lite-v2, +10.7 on ScreenSpot-Pro) and sets new state-of-the-art results (94.4% on MMBench-GUI L1-Hard, 60.6% on OSWorld-G, 47.4% on WebArena-Lite-v2). These findings underscore the power of data-driven scaling for general-purpose computer use agents. We will release data, models, and code to advance future research: https://github.com/OpenGVLab/ScaleCUA.

View Paper