< Explain other AI papers

OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis

Kanzhi Cheng, Zehao Li, Zheng Ma, Nuo Chen, Jialin Cao, Qiushi Sun, Zichen Ding, Fangzhi Xu, Hang Yan, Jiajun Chen, Anh Tuan Luu, Jianbing Zhang, Lewei Lu, Dahua Lin

2026-04-23

OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis

Summary

This paper introduces OpenMobile, a new open-source system for creating and training 'mobile agents' – essentially AI that can use a smartphone to complete tasks. These agents use both vision (what they 'see' on the screen) and language (understanding instructions) to work, and recent advances have made them quite good, but the data used to train them is usually kept secret.

What's the problem?

Currently, the best mobile agents are trained on datasets that aren't publicly available, and the methods used to create those datasets and train the agents aren't well understood. This makes it hard for other researchers to build upon this work or even compare their own agents fairly. It's like trying to learn from a master chef who won't share their recipe or ingredients!

What's the solution?

The researchers created OpenMobile, a framework with two main parts. First, it automatically builds a 'memory' of what an app looks like and how it works by exploring it. Then, it uses this memory to generate lots of different instructions for tasks, and creates the steps needed to complete them. Finally, it uses a clever training technique where the agent sometimes learns from its own mistakes and sometimes copies an 'expert' to improve its performance. They then used this system to train models like Qwen2.5-VL and Qwen3-VL.

Why it matters?

OpenMobile is important because it provides the tools and data needed for more researchers to work on mobile agents. By making everything open-source, it allows others to understand how these agents are built, improve upon the existing methods, and ultimately create even more capable AI that can help us with tasks on our phones. The results show their open data approach can achieve performance comparable to closed-source systems, and they’ve shown it’s not just memorizing the test tasks.

Abstract

Mobile agents powered by vision-language models have demonstrated impressive capabilities in automating mobile tasks, with recent leading models achieving a marked performance leap, e.g., nearly 70% success on AndroidWorld. However, these systems keep their training data closed and remain opaque about their task and trajectory synthesis recipes. We present OpenMobile, an open-source framework that synthesizes high-quality task instructions and agent trajectories, with two key components: (1) The first is a scalable task synthesis pipeline that constructs a global environment memory from exploration, then leverages it to generate diverse and grounded instructions. and (2) a policy-switching strategy for trajectory rollout. By alternating between learner and expert models, it captures essential error-recovery data often missing in standard imitation learning. Agents trained on our data achieve competitive results across three dynamic mobile agent benchmarks: notably, our fine-tuned Qwen2.5-VL and Qwen3-VL reach 51.7% and 64.7% on AndroidWorld, far surpassing existing open-data approaches. Furthermore, we conduct transparent analyses on the overlap between our synthetic instructions and benchmark test sets, and verify that performance gains stem from broad functionality coverage rather than benchmark overfitting. We release data and code at https://njucckevin.github.io/openmobile/ to bridge the data gap and facilitate broader mobile agent research.