Agent models: Internalizing Chain-of-Action Generation into Reasoning models

Yuxiang Zhang, Yuqi Yang, Jiangming Shu, Xinyan Wen, Jitao Sang

2025-03-11

Agent models: Internalizing Chain-of-Action Generation into Reasoning
models

Summary

This paper talks about AutoCoA, a new way to make AI models smarter at using tools like calculators or databases by teaching them to plan their own steps instead of relying on external instructions.

What's the problem?

Current AI models need detailed prompts to use tools effectively, making them slow and clumsy in tasks that require multiple steps or long-term planning, like solving math problems or researching answers.

What's the solution?

AutoCoA trains AI to automatically decide when and how to use tools by practicing with a mix of step-by-step examples and trial-and-error learning. It also uses a simulated world to reduce the need for real-world testing, making training faster.

Why it matters?

This makes AI more independent and efficient at complex tasks, helping it handle real-world challenges like customer service or research where quick, multi-step decisions are needed.

Abstract

Traditional agentic workflows rely on external prompts to manage interactions with tools and the environment, which limits the autonomy of reasoning models. We position Large Agent Models (LAMs) that internalize the generation of Chain-of-Action (CoA), enabling the model to autonomously decide when and how to use external tools. Our proposed AutoCoA framework combines supervised fine-tuning (SFT) and reinforcement learning (RL), allowing the model to seamlessly switch between reasoning and action while efficiently managing environment interactions. Main components include step-level action triggering, trajectory-level CoA optimization, and an internal world model to reduce real-environment interaction costs. Evaluations on open-domain QA tasks demonstrate that AutoCoA-trained agent models significantly outperform ReAct-based workflows in task completion, especially in tasks that require long-term reasoning and multi-step actions. Code and dataset are available at https://github.com/ADaM-BJTU/AutoCoA

View Paper