UI-Venus-1.5 Technical Report

Veuns-Team, Changlong Gao, Zhangxuan Gu, Yulin Liu, Xinyu Qiu, Shuheng Shen, Yue Wen, Tianyu Xia, Zhenyu Xu, Zhengwen Zeng, Beitong Zhou, Xingran Zhou, Weizhi Chen, Sunhao Dai, Jingya Dou, Yichen Gong, Yuan Guo, Zhenlin Guo, Feng Li, Qian Li, Jinzhen Lin, Yuqi Zhou

2026-02-11

Summary

This paper introduces UI-Venus-1.5, a new and improved computer program designed to automate tasks within graphical user interfaces, like apps on your phone or computer.

What's the problem?

Automating tasks on computers and phones is hard because programs need to understand what you want *and* be able to handle different situations and apps. Existing programs either work well in limited cases or struggle with the variety of real-world digital environments. It's difficult to create a single program that can reliably perform many different tasks across many different apps.

What's the solution?

The researchers created UI-Venus-1.5, which comes in different sizes (2 billion, 8 billion, and 30 billion parameters) to suit different needs. They improved it in three main ways: first, they trained it on a huge amount of data showing how GUIs work; second, they used a technique called reinforcement learning where the program learns by trying things and getting rewarded for success, especially for complex, multi-step tasks; and third, they combined different specialized programs (for understanding images, web pages, and mobile apps) into one unified program. This merging process makes it more versatile.

Why it matters?

UI-Venus-1.5 is a significant step forward because it achieves better performance on standard tests and, importantly, works well with real-world apps, even Chinese mobile apps. This means it could eventually lead to more helpful and reliable automation tools for everyone, making it easier to use computers and phones.

Abstract

GUI agents have emerged as a powerful paradigm for automating interactions in digital environments, yet achieving both broad generality and consistently strong task performance remains challenging.In this report, we present UI-Venus-1.5, a unified, end-to-end GUI Agent designed for robust real-world applications.The proposed model family comprises two dense variants (2B and 8B) and one mixture-of-experts variant (30B-A3B) to meet various downstream application scenarios.Compared to our previous version, UI-Venus-1.5 introduces three key technical advances: (1) a comprehensive Mid-Training stage leveraging 10 billion tokens across 30+ datasets to establish foundational GUI semantics; (2) Online Reinforcement Learning with full-trajectory rollouts, aligning training objectives with long-horizon, dynamic navigation in large-scale environments; and (3) a single unified GUI Agent constructed via Model Merging, which synthesizes domain-specific models (grounding, web, and mobile) into one cohesive checkpoint. Extensive evaluations demonstrate that UI-Venus-1.5 establishes new state-of-the-art performance on benchmarks such as ScreenSpot-Pro (69.6%), VenusBench-GD (75.0%), and AndroidWorld (77.6%), significantly outperforming previous strong baselines. In addition, UI-Venus-1.5 demonstrates robust navigation capabilities across a variety of Chinese mobile apps, effectively executing user instructions in real-world scenarios. Code: https://github.com/inclusionAI/UI-Venus; Model: https://huggingface.co/collections/inclusionAI/ui-venus

View Paper