< Explain other AI papers

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Tianbao Xie, Jiaqi Deng, Xiaochuan Li, Junlin Yang, Haoyuan Wu, Jixuan Chen, Wenjing Hu, Xinyuan Wang, Yuhui Xu, Zekun Wang, Yiheng Xu, Junli Wang, Doyen Sahoo, Tao Yu, Caiming Xiong

2025-05-20

Scaling Computer-Use Grounding via User Interface Decomposition and
  Synthesis

Summary

This paper talks about a new way to help AI better understand and use computer interfaces, like windows and buttons, by breaking down and studying how people interact with them.

What's the problem?

The problem is that it's hard for AI to figure out how to use computer programs and websites the way humans do, because these interfaces can be really complicated and there hasn't been enough good data for training AI on these tasks.

What's the solution?

To solve this, the researchers created a special benchmark called OSWorld-G and a dataset named Jedi, which both record and organize lots of examples of people using different computer interfaces. This helps AI learn how to handle more complex tasks on a computer.

Why it matters?

This matters because making AI better at using computers like humans can lead to smarter digital assistants and more helpful tools, making technology easier and more powerful for everyone.

Abstract

A new benchmark OSWorld-G and dataset Jedi enhance GUI grounding by capturing complex interactions, leading to improved performance in computer use agents.