< Explain other AI papers

Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents

Haiyang Xu, Xi Zhang, Haowei Liu, Junyang Wang, Zhaozai Zhu, Shengjie Zhou, Xuhao Hu, Feiyu Gao, Junjie Cao, Zihua Wang, Zhiyuan Chen, Jitong Liao, Qi Zheng, Jiahui Zeng, Ze Xu, Shuai Bai, Junyang Lin, Jingren Zhou, Ming Yan

2026-02-20

Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents

Summary

This paper introduces GUI-Owl-1.5, a new artificial intelligence model designed to interact with computer interfaces like a human would. It comes in different sizes and can work on phones, computers, and web browsers, allowing for seamless interaction between cloud systems and devices you use every day.

What's the problem?

Existing AI models struggled to reliably understand and interact with graphical user interfaces (GUIs) – things like buttons, menus, and windows – across different platforms. They weren't very good at automating tasks, understanding what was on the screen, using tools within applications, or remembering information needed for complex interactions. Training these models was also difficult and inefficient, especially when dealing with multiple platforms and tasks that require many steps.

What's the solution?

The researchers developed GUI-Owl-1.5 using a few key innovations. First, they created a smart system for collecting data to train the AI, combining simulated environments with real-world testing. Second, they improved the AI’s reasoning abilities, specifically focusing on using tools, remembering information, and adapting to different situations. Finally, they designed a new training method, called MRPO, to handle the challenges of different platforms and long, complex tasks, making the training process more effective.

Why it matters?

GUI-Owl-1.5 represents a significant step forward in AI’s ability to interact with the digital world as humans do. This has huge potential for automating tasks on your computer or phone, creating more accessible technology, and building AI assistants that can truly understand and help with everyday digital activities. The fact that the model is open-source means other researchers and developers can build upon this work, accelerating progress in the field.

Abstract

The paper introduces GUI-Owl-1.5, the latest native GUI agent model that features instruct/thinking variants in multiple sizes (2B/4B/8B/32B/235B) and supports a range of platforms (desktop, mobile, browser, and more) to enable cloud-edge collaboration and real-time interaction. GUI-Owl-1.5 achieves state-of-the-art results on more than 20+ GUI benchmarks on open-source models: (1) on GUI automation tasks, it obtains 56.5 on OSWorld, 71.6 on AndroidWorld, and 48.4 on WebArena; (2) on grounding tasks, it obtains 80.3 on ScreenSpotPro; (3) on tool-calling tasks, it obtains 47.6 on OSWorld-MCP, and 46.8 on MobileWorld; (4) on memory and knowledge tasks, it obtains 75.5 on GUI-Knowledge Bench. GUI-Owl-1.5 incorporates several key innovations: (1) Hybird Data Flywheel: we construct the data pipeline for UI understanding and trajectory generation based on a combination of simulated environments and cloud-based sandbox environments, in order to improve the efficiency and quality of data collection. (2) Unified Enhancement of Agent Capabilities: we use a unified thought-synthesis pipeline to enhance the model's reasoning capabilities, while placing particular emphasis on improving key agent abilities, including Tool/MCP use, memory and multi-agent adaptation; (3) Multi-platform Environment RL Scaling: We propose a new environment RL algorithm, MRPO, to address the challenges of multi-platform conflicts and the low training efficiency of long-horizon tasks. The GUI-Owl-1.5 models are open-sourced, and an online cloud-sandbox demo is available at https://github.com/X-PLUG/MobileAgent.