GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

Rui Yang, Qianhui Wu, Zhaoyang Wang, Hanyang Chen, Ke Yang, Hao Cheng, Huaxiu Yao, Baoling Peng, Huan Zhang, Jianfeng Gao, Tong Zhang

2026-02-26

GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

Summary

This paper focuses on improving how well open-source computer programs, called GUI agents, can perform complex tasks that require multiple steps, like navigating websites or apps. Currently, these open-source agents aren't as good as the ones developed by big companies.

What's the problem?

The main issue is that open-source agents lack enough good examples of how to think through actions and successfully complete tasks. Existing methods for improving these agents, like teaching them to 'think step-by-step' or using reinforcement learning, don't work very well for GUI agents specifically. The 'think step-by-step' approach can actually make them less accurate, and the reinforcement learning method struggles because it's hard to tell if an agent is truly improving when many different actions could be correct.

What's the solution?

The researchers developed a system called GUI-Libra to address these problems. First, they created a large dataset of 81,000 examples showing the reasoning behind actions in GUI environments. Then, they changed how the agent learns to balance thinking and acting, making sure it focuses on both. Finally, they improved the reinforcement learning process by adding a way to prevent the agent from learning from unreliable feedback and by carefully controlling how much the agent changes its behavior during learning.

Why it matters?

This work is important because it shows that you can significantly improve the performance of open-source GUI agents without needing to collect a ton of new data through trial and error. By carefully curating data and improving the learning process, they were able to achieve better results, making these agents more useful and accessible for a wider range of applications.

Abstract

Open-source native GUI agents still lag behind closed-source systems on long-horizon navigation tasks. This gap stems from two limitations: a shortage of high-quality, action-aligned reasoning data, and the direct adoption of generic post-training pipelines that overlook the unique challenges of GUI agents. We identify two fundamental issues in these pipelines: (i) standard SFT with CoT reasoning often hurts grounding, and (ii) step-wise RLVR-tyle training faces partial verifiability, where multiple actions can be correct but only a single demonstrated action is used for verification. This makes offline step-wise metrics weak predictors of online task success. In this work, we present GUI-Libra, a tailored training recipe that addresses these challenges. First, to mitigate the scarcity of action-aligned reasoning data, we introduce a data construction and filtering pipeline and release a curated 81K GUI reasoning dataset. Second, to reconcile reasoning with grounding, we propose action-aware SFT that mixes reasoning-then-action and direct-action data and reweights tokens to emphasize action and grounding. Third, to stabilize RL under partial verifiability, we identify the overlooked importance of KL regularization in RLVR and show that a KL trust region is critical for improving offline-to-online predictability; we further introduce success-adaptive scaling to downweight unreliable negative gradients. Across diverse web and mobile benchmarks, GUI-Libra consistently improves both step-wise accuracy and end-to-end task completion. Our results suggest that carefully designed post-training and data curation can unlock significantly stronger task-solving capabilities without costly online data collection. We release our dataset, code, and models to facilitate further research on data-efficient post-training for reasoning-capable GUI agents.

View Paper