ColorAgent: Building A Robust, Personalized, and Interactive OS Agent

Ning Li, Qiqiang Lin, Zheng Wu, Xiaoyun Mo, Weiming Zhang, Yin Zhao, Xiangmou Qu, Jiamu Zhou, Jun Wang, Congmin Zheng, Yuanyi Song, Hongjiang Chen, Heyuan Huang, Jihong Wang, Jiaxin Yin, Jingwei Yu, Junwei Liao, Qiuying Peng, Xingyu Lou, Jun Wang, Weiwen Liu, Zhuosheng Zhang

2025-10-23

ColorAgent: Building A Robust, Personalized, and Interactive OS Agent

Summary

This paper introduces ColorAgent, a new type of operating system agent that aims to make interacting with your phone or computer feel more like having a helpful assistant rather than just giving commands. It's a step towards operating systems that understand what you *want* to do, not just what you *tell* them to do.

What's the problem?

Traditionally, interacting with computers meant typing specific commands. While graphical user interfaces improved things, we're now looking for even more intuitive ways to use technology. The challenge is building an AI that can reliably carry out complex tasks over a long period of time, understand what a user actually intends, and proactively offer help without being annoying or making mistakes. Existing methods struggle with consistently completing tasks and adapting to individual user preferences.

What's the solution?

The researchers created ColorAgent and trained it using a couple of key techniques. First, they used reinforcement learning, which is like teaching the AI through trial and error, rewarding it for successful actions. They also used a method where the AI essentially improves itself over time. To make sure the AI works well in different situations, they designed a system where multiple 'agents' work together, ensuring it's consistent and doesn't break easily. Finally, they focused on making the AI understand each user's unique way of asking for things and anticipate their needs.

Why it matters?

This work is important because it represents a significant advance in making computers more user-friendly and helpful. ColorAgent achieves better results on standard tests than previous systems, showing it's a promising step towards truly intelligent operating system agents. However, the researchers also point out that current tests aren't good enough to fully evaluate these agents, highlighting the need for better ways to measure their performance and address potential security concerns as these AI systems become more powerful.

Abstract

With the advancements in hardware, software, and large language model technologies, the interaction between humans and operating systems has evolved from the command-line interface to the rapidly emerging AI agent interactions. Building an operating system (OS) agent capable of executing user instructions and faithfully following user desires is becoming a reality. In this technical report, we present ColorAgent, an OS agent designed to engage in long-horizon, robust interactions with the environment while also enabling personalized and proactive user interaction. To enable long-horizon interactions with the environment, we enhance the model's capabilities through step-wise reinforcement learning and self-evolving training, while also developing a tailored multi-agent framework that ensures generality, consistency, and robustness. In terms of user interaction, we explore personalized user intent recognition and proactive engagement, positioning the OS agent not merely as an automation tool but as a warm, collaborative partner. We evaluate ColorAgent on the AndroidWorld and AndroidLab benchmarks, achieving success rates of 77.2% and 50.7%, respectively, establishing a new state of the art. Nonetheless, we note that current benchmarks are insufficient for a comprehensive evaluation of OS agents and propose further exploring directions in future work, particularly in the areas of evaluation paradigms, agent collaboration, and security. Our code is available at https://github.com/MadeAgents/mobile-use.

View Paper