UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning

Zhengxi Lu, Jiabo Ye, Fei Tang, Yongliang Shen, Haiyang Xu, Ziwei Zheng, Weiming Lu, Ming Yan, Fei Huang, Jun Xiao, Yueting Zhuang

2025-09-16

UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning

Summary

This paper introduces a new way to train computer programs, called Semi-online Reinforcement Learning, to interact with graphical user interfaces like those on your phone or computer. It aims to make these programs better at completing complex tasks within apps.

What's the problem?

Training these programs is tricky because there are two main approaches, each with drawbacks. One way, called offline reinforcement learning, uses recordings of people using apps, but it struggles when a task takes many steps because it doesn't get clear feedback along the way. The other way, online reinforcement learning, lets the program learn by actually trying things out, but this can be slow, give very little feedback, and is expensive because it requires a lot of interaction with the real app.

What's the solution?

The researchers came up with Semi-online Reinforcement Learning, which tries to get the best of both worlds. It simulates the 'trying things out' part of online learning, but does it using the pre-recorded data from offline learning. They also added a 'Patch Module' that helps the program stay on track by comparing its actions to those of a human expert. Finally, they improved how the program learns from rewards, considering not just immediate results but also potential future benefits. They also created a new way to measure how well the program is doing that better reflects real-world performance.

Why it matters?

This work is important because it makes it possible to train programs that can automate complex tasks on apps more efficiently and effectively. The new method significantly improves performance compared to existing techniques, bringing us closer to having AI assistants that can reliably handle multi-step tasks within user interfaces, like booking a flight or completing a level in a game.

Abstract

Graphical User Interface (GUI) agents have demonstrated remarkable progress in automating complex user interface interactions through reinforcement learning. However, current approaches face a fundamental dilemma: offline RL enables stable training on pre-collected trajectories, but struggles with multi-step task execution for lack of trajectory-level reward signals; online RL captures these signals through environment interaction, but suffers from sparse rewards and prohibitive deployment costs. To address it, we present Semi-online Reinforcement Learning, a novel paradigm that simulates online RL on offline trajectories. During each rollout process, we preserve the original model output within the multi-turn dialogue, where a Patch Module adaptively recovers the divergence between rollout and expert trajectories. To capture long-term training signals, Semi-online RL introduces discounted future returns into the reward computation and optimizes the policy with weighted step-level and episode-level advantages. We further introduce Semi-Online Performance (SOP), a metric that aligns better with true online performance, serving as a practical and effective proxy for real-world evaluation. Experiments show that ours Semi-online RL achieves SOTA performance among 7B models across four dynamic benchmarks, with significant gains over the base model (e.g., +12.0% on AndroidWorld, +23.8% on AITW), demonstrating significant progress in bridging the gap between offline training efficiency and online multi-turn reasoning. The code is available at https://github.com/X-PLUG/MobileAgent/tree/main/UI-S1.

View Paper