In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Zhuofeng Li, Haoxiang Zhang, Seungju Han, Sheng Liu, Jianwen Xie, Yu Zhang, Yejin Choi, James Zou, Pan Lu

2025-10-08

In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Summary

This paper introduces AgentFlow, a new way to build AI agents powered by large language models that can effectively use tools to solve complex problems. It focuses on improving how these agents learn and adapt during interactions, rather than relying on pre-programmed steps or training that doesn't reflect real-time problem-solving.

What's the problem?

Current AI agents using large language models often struggle when tasks are long or require using many different tools. They typically try to handle everything at once, which becomes inefficient and makes it hard for them to adjust to new situations. Existing 'agentic' systems, which break down tasks into smaller parts handled by specialized modules, usually aren't trained directly while they're actively working on a problem, limiting their ability to improve.

What's the solution?

AgentFlow solves this by creating an agent with four distinct parts: a planner that decides what to do, an executor that carries out those plans, a verifier that checks the results, and a generator that creates outputs. These parts work together, constantly updating their strategies based on what happens during the task. A key innovation is a new training method called Flow-GRPO, which allows the planner to learn 'on the fly' by receiving feedback after each complete attempt, making sure each step contributes to the overall goal. This method simplifies the learning process and makes it more stable.

Why it matters?

AgentFlow represents a significant step forward in building more capable and adaptable AI agents. It outperforms existing methods across a variety of challenging tasks, including search, problem-solving, math, and science, even surpassing the performance of some very powerful, closed-source models. This means we're getting closer to AI systems that can reliably tackle complex real-world problems and continue to learn and improve as they go.

Abstract

Outcome-driven reinforcement learning has advanced reasoning in large language models (LLMs), but prevailing tool-augmented approaches train a single, monolithic policy that interleaves thoughts and tool calls under full context; this scales poorly with long horizons and diverse tools and generalizes weakly to new scenarios. Agentic systems offer a promising alternative by decomposing work across specialized modules, yet most remain training-free or rely on offline training decoupled from the live dynamics of multi-turn interaction. We introduce AgentFlow, a trainable, in-the-flow agentic framework that coordinates four modules (planner, executor, verifier, generator) through an evolving memory and directly optimizes its planner inside the multi-turn loop. To train on-policy in live environments, we propose Flow-based Group Refined Policy Optimization (Flow-GRPO), which tackles long-horizon, sparse-reward credit assignment by converting multi-turn optimization into a sequence of tractable single-turn policy updates. It broadcasts a single, verifiable trajectory-level outcome to every turn to align local planner decisions with global success and stabilizes learning with group-normalized advantages. Across ten benchmarks, AgentFlow with a 7B-scale backbone outperforms top-performing baselines with average accuracy gains of 14.9% on search, 14.0% on agentic, 14.5% on mathematical, and 4.1% on scientific tasks, even surpassing larger proprietary models like GPT-4o. Further analyses confirm the benefits of in-the-flow optimization, showing improved planning, enhanced tool-calling reliability, and positive scaling with model size and reasoning turns.

View Paper