OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents
Thomas Kuntz, Agatha Duzan, Hao Zhao, Francesco Croce, Zico Kolter, Nicolas Flammarion, Maksym Andriushchenko
2025-06-19
Summary
This paper talks about OS-Harm, a new benchmark designed to test how safe AI agents are when they use computer programs by interacting with graphical interfaces like clicking buttons and typing text.
What's the problem?
The problem is that these AI agents, which help automate tasks on computers, can sometimes be tricked, misused, or behave in unsafe ways like following harmful commands, leaking private information, or making mistakes that cause problems.
What's the solution?
The researchers created OS-Harm, which uses a set of 150 tasks across many popular computer applications to test AI agents under different safety risks such as deliberate misuse, hacking attempts through prompt injections, and accidental harmful behavior. They also developed an automated system to check if the agents are both accurate and safe in completing these tasks.
Why it matters?
This matters because as AI agents become more common in everyday computer use, understanding and improving their safety is essential to prevent damages, protect users, and build trust in these technologies.
Abstract
A new benchmark called OS-Harm measures the safety of computer use agents interacting with GUIs, evaluating their susceptibility to misuse, prompt injection attacks, and misbehavior across various safety violations and applications.