Morae: Proactively Pausing UI Agents for User Choices
Yi-Hao Peng, Dingzeyu Li, Jeffrey P. Bigham, Amy Pavel
2025-09-01
Summary
This paper introduces a new type of helper program, called a UI agent, designed to help people with vision impairments use computers and the internet more easily.
What's the problem?
Current UI agents try to do everything automatically, which can be frustrating for users because they don't get to make choices or even know what's happening. Imagine asking an agent to buy something and it just picks one option without telling you about other similar, potentially better, choices. This takes away the user's control and ability to get exactly what they want.
What's the solution?
The researchers created a UI agent called Morae that's different. Instead of doing everything on its own, Morae pauses at important points and asks the user what they want to do. It uses advanced AI to understand what the user is asking for, look at the computer screen, and then ask clarifying questions when there are multiple options. This way, the user stays in control and can make informed decisions.
Why it matters?
This work shows the value of a 'mixed-initiative' approach, where the computer helps, but the user still has the final say. By giving users more control, Morae helps them complete tasks more successfully and find options they actually prefer, making technology more accessible and useful for everyone.
Abstract
User interface (UI) agents promise to make inaccessible or complex UIs easier to access for blind and low-vision (BLV) users. However, current UI agents typically perform tasks end-to-end without involving users in critical choices or making them aware of important contextual information, thus reducing user agency. For example, in our field study, a BLV participant asked to buy the cheapest available sparkling water, and the agent automatically chose one from several equally priced options, without mentioning alternative products with different flavors or better ratings. To address this problem, we introduce Morae, a UI agent that automatically identifies decision points during task execution and pauses so that users can make choices. Morae uses large multimodal models to interpret user queries alongside UI code and screenshots, and prompt users for clarification when there is a choice to be made. In a study over real-world web tasks with BLV participants, Morae helped users complete more tasks and select options that better matched their preferences, as compared to baseline agents, including OpenAI Operator. More broadly, this work exemplifies a mixed-initiative approach in which users benefit from the automation of UI agents while being able to express their preferences.