OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows

Qiushi Sun, Mukai Li, Zhoumianze Liu, Zhihui Xie, Fangzhi Xu, Zhangyue Yin, Kanzhi Cheng, Zehao Li, Zichen Ding, Qi Liu, Zhiyong Wu, Zhuosheng Zhang, Ben Kao, Lingpeng Kong

2025-11-03

OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows

Summary

This research focuses on the safety of computer programs, called agents, that use both vision (seeing what's on a screen) and language (understanding instructions) to operate things like smartphones. These agents are getting really good at using apps, but there's a risk they could do unsafe things, like mess up your phone's security or reveal private information.

What's the problem?

As these agents become more capable, it's hard to predict all the ways they might cause problems. Mobile environments, like your phone, are incredibly complex, making it difficult to test for every possible safety issue. There wasn't a good way to systematically find and address these risks before this work.

What's the solution?

The researchers created a special testing environment called MobileRisk-Live, which lets them safely observe how these agents behave. They also developed a system called OS-Sentinel. This system works in two parts: one part formally checks if the agent is breaking any rules about how the phone's operating system works, and the other part uses a powerful language model to understand the context of what the agent is doing and identify risks that aren't obvious from the rules alone. By combining these two approaches, OS-Sentinel is better at detecting safety issues.

Why it matters?

This research is important because it provides the tools and understanding needed to build safer and more trustworthy AI agents for mobile devices. As these agents become more common, ensuring their safety is crucial to protect users' privacy and security, and to allow us to confidently use them for automation.

Abstract

Computer-using agents powered by Vision-Language Models (VLMs) have demonstrated human-like capabilities in operating digital environments like mobile platforms. While these agents hold great promise for advancing digital automation, their potential for unsafe operations, such as system compromise and privacy leakage, is raising significant concerns. Detecting these safety concerns across the vast and complex operational space of mobile environments presents a formidable challenge that remains critically underexplored. To establish a foundation for mobile agent safety research, we introduce MobileRisk-Live, a dynamic sandbox environment accompanied by a safety detection benchmark comprising realistic trajectories with fine-grained annotations. Built upon this, we propose OS-Sentinel, a novel hybrid safety detection framework that synergistically combines a Formal Verifier for detecting explicit system-level violations with a VLM-based Contextual Judge for assessing contextual risks and agent actions. Experiments show that OS-Sentinel achieves 10%-30% improvements over existing approaches across multiple metrics. Further analysis provides critical insights that foster the development of safer and more reliable autonomous mobile agents.

View Paper