UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents
Han Xiao, Guozhi Wang, Yuxiang Chai, Zimu Lu, Weifeng Lin, Hao He, Lue Fan, Liuyang Bian, Rui Hu, Liang Liu, Shuai Ren, Yafei Wen, Xiaoxin Chen, Aojun Zhou, Hongsheng Li
2025-05-28
Summary
This paper talks about a new system called UI-Genie that helps AI agents get better at understanding and working with mobile app screens by constantly learning and improving from their own experiences.
What's the problem?
The problem is that AI agents that try to use and understand mobile app interfaces often make mistakes because it's hard for them to figure out what to do just by looking at the screen and reading the text, especially since apps can look very different from each other.
What's the solution?
The researchers created a special reward system that uses both images and text to help the AI learn what actions are good or bad, and they set up a process where the AI keeps improving itself over time by learning from its successes and failures.
Why it matters?
This matters because it makes AI agents much better at navigating and using mobile apps, which can help with things like accessibility, automation, and making apps easier for everyone to use.
Abstract
UI-Genie framework addresses GUI agent challenges through a reward model with image-text architecture and a self-improvement pipeline, achieving state-of-the-art performance on multiple benchmarks.