CoAct-1: Computer-using Agents with Coding as Actions

Linxin Song, Yutong Dai, Viraj Prabhu, Jieyu Zhang, Taiwei Shi, Li Li, Junnan Li, Silvio Savarese, Zeyuan Chen, Jieyu Zhao, Ran Xu, Caiming Xiong

2025-08-08

CoAct-1: Computer-using Agents with Coding as Actions

Summary

This paper talks about CoAct-1, a system where multiple AI agents work together to control computer programs by clicking and typing on GUIs while also using coding commands to get tasks done.

What's the problem?

The problem is that automating complex computer tasks is hard because using just one way, like clicking buttons or writing code, often isn’t enough for the AI to handle everything efficiently and accurately.

What's the solution?

The solution was to create a system where AI agents combine both GUI interactions and coding as actions, allowing them to work together to complete tasks more effectively by choosing the best way to act at each step.

Why it matters?

This matters because it can make automating complicated computer tasks faster and more reliable, which can help people save time and reduce errors in software operations.

Abstract

A multi-agent system that combines GUI control with programmatic execution improves efficiency and success in complex computer automation tasks.

View Paper