CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning

Zeyi Sun, Yuhang Cao, Jianze Liang, Qiushi Sun, Ziyu Liu, Zhixiong Zhang, Yuhang Zang, Xiaoyi Dong, Kai Chen, Dahua Lin, Jiaqi Wang

2025-08-28

CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning

Summary

This paper introduces CODA, a new system designed to help computer programs automatically interact with complex software, specifically in scientific fields like running simulations or analyzing data.

What's the problem?

Currently, programs that try to automate tasks on computers face a tough choice. They can be good at *planning* what to do, but bad at actually *doing* it accurately, or they can be excellent at precise actions but struggle with figuring out a long series of steps to achieve a goal. Existing systems that try to combine both planning and action are usually fixed and can't learn from experience, which is a big problem because getting good examples in science is hard and expensive.

What's the solution?

The researchers created CODA, which stands for Compositional Domain Adaptation. It works in two steps. First, it trains separate 'expert planners' for each specific scientific task using a small amount of example data. Then, it combines all the successful plans from these experts into one big dataset and uses that to improve the overall planning ability, making it work well across different scientific problems. Think of it like learning to drive a specific car, then using that knowledge to quickly learn to drive other cars.

Why it matters?

This is important because it allows computers to more easily automate complex tasks in science, which could speed up research and discovery. CODA performs better than existing methods on challenging scientific tasks and is openly available, meaning other researchers can build upon this work.

Abstract

Autonomous agents for Graphical User Interfaces (GUIs) face significant challenges in specialized domains such as scientific computing, where both long-horizon planning and precise execution are required. Existing approaches suffer from a trade-off: generalist agents excel at planning but perform poorly in execution, while specialized agents demonstrate the opposite weakness. Recent compositional frameworks attempt to bridge this gap by combining a planner and an actor, but they are typically static and non-trainable, which prevents adaptation from experience. This is a critical limitation given the scarcity of high-quality data in scientific domains. To address these limitations, we introduce CODA, a novel and trainable compositional framework that integrates a generalist planner (Cerebrum) with a specialist executor (Cerebellum), trained via a dedicated two-stage pipeline. In the first stage, Specialization, we apply a decoupled GRPO approach to train an expert planner for each scientific application individually, bootstrapping from a small set of task trajectories. In the second stage, Generalization, we aggregate all successful trajectories from the specialized experts to build a consolidated dataset, which is then used for supervised fine-tuning of the final planner. This equips CODA with both robust execution and cross-domain generalization. Evaluated on four challenging applications from the ScienceBoard benchmark, CODA significantly outperforms baselines and establishes a new state of the art among open-source models.

View Paper