LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training

Yiming Wang, Da Yin, Yuedong Cui, Ruichen Zheng, Zhiqian Li, Zongyu Lin, Di Wu, Xueqing Wu, Chenchen Ye, Yu Zhou, Kai-Wei Chang

2025-10-17

LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training

Summary

This paper introduces a new way to create training data for digital agents, which are essentially AI programs that interact with computers like a human would. It focuses on making these agents better at using user interfaces (UIs) like websites and apps.

What's the problem?

Training digital agents to reliably use different UIs is really hard because it requires a huge amount of example data showing them how to interact with things. Getting this data is expensive and time-consuming, as it needs people to manually demonstrate actions and requires a lot of computing resources to manage and process everything.

What's the solution?

The researchers developed a system called UI-Simulator that automatically generates this training data. It uses a simulated digital world to create different UI states, then intelligently explores how to move between those states, and finally packages this exploration into usable training examples. They also created a method called UI-Simulator-Grow that focuses on creating the *most* helpful training data, making the process even more efficient. Essentially, it doesn't just make a lot of data, it makes smart data.

Why it matters?

This work is important because it allows for the creation of more robust and capable digital agents without the massive cost of collecting real-world data. The results show that agents trained with UI-Simulator can perform as well as, or even better than, agents trained on real UIs, and it can even allow smaller AI models to achieve performance comparable to much larger models, opening up possibilities for more accessible and efficient AI.

Abstract

Digital agents require diverse, large-scale UI trajectories to generalize across real-world tasks, yet collecting such data is prohibitively expensive in both human annotation, infra and engineering perspectives. To this end, we introduce UI-Simulator, a scalable paradigm that generates structured UI states and transitions to synthesize training trajectories at scale. Our paradigm integrates a digital world simulator for diverse UI states, a guided rollout process for coherent exploration, and a trajectory wrapper that produces high-quality and diverse trajectories for agent training. We further propose UI-Simulator-Grow, a targeted scaling strategy that enables more rapid and data-efficient scaling by prioritizing high-impact tasks and synthesizes informative trajectory variants. Experiments on WebArena and AndroidWorld show that UI-Simulator rivals or surpasses open-source agents trained on real UIs with significantly better robustness, despite using weaker teacher models. Moreover, UI-Simulator-Grow matches the performance of Llama-3-70B-Instruct using only Llama-3-8B-Instruct as the base model, highlighting the potential of targeted synthesis scaling paradigm to continuously and efficiently enhance the digital agents.

View Paper