Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training
Siyu Yuan, Zehui Chen, Zhiheng Xi, Junjie Ye, Zhengyin Du, Jiecao Chen
2025-01-22

Summary
This paper talks about Agent-R, a new way to train AI language models to think about their actions and fix their mistakes while they're working on tasks. It's like teaching a computer to double-check its work and learn from its errors, just like how students are taught to review their answers on a test.
What's the problem?
Current AI language models are great at copying what experts do, but they struggle when they make mistakes because they don't know how to fix them. It's like a student who can follow instructions perfectly but gets stuck if something unexpected happens. Also, it's really hard and expensive for humans to teach these AIs how to correct every possible mistake.
What's the solution?
The researchers created Agent-R, which uses a clever trick called Monte Carlo Tree Search to help the AI practice fixing its mistakes. Instead of just telling the AI if it's right or wrong, Agent-R shows it how to get back on track when it messes up. The AI learns to spot its first mistake quickly and figure out how to fix it, rather than waiting until the end of a task to realize something went wrong. Agent-R keeps doing this over and over, helping the AI get better at catching and fixing errors on its own.
Why it matters?
This matters because it could make AI assistants much more reliable and useful in the real world. Imagine having a digital helper that can not only follow instructions but also figure out what went wrong if it makes a mistake, just like a human would. This could lead to AI that can handle more complex tasks without needing constant human supervision. It's a big step towards creating AI that can think more like humans do, adapting to new situations and learning from their experiences.
Abstract
Large Language Models (LLMs) agents are increasingly pivotal for addressing complex tasks in interactive environments. Existing work mainly focuses on enhancing performance through behavior cloning from stronger experts, yet such approaches often falter in real-world applications, mainly due to the inability to recover from errors. However, step-level critique data is difficult and expensive to collect. Automating and dynamically constructing self-critique datasets is thus crucial to empowering models with intelligent agent capabilities. In this work, we propose an iterative self-training framework, Agent-R, that enables language Agent to Reflect on the fly. Unlike traditional methods that reward or penalize actions based on correctness, Agent-R leverages MCTS to construct training data that recover correct trajectories from erroneous ones. A key challenge of agent reflection lies in the necessity for timely revision rather than waiting until the end of a rollout. To address this, we introduce a model-guided critique construction mechanism: the actor model identifies the first error step (within its current capability) in a failed trajectory. Starting from it, we splice it with the adjacent correct path, which shares the same parent node in the tree. This strategy enables the model to learn reflection based on its current policy, therefore yielding better learning efficiency. To further explore the scalability of this self-improvement paradigm, we investigate iterative refinement of both error correction capabilities and dataset construction. Our findings demonstrate that Agent-R continuously improves the model's ability to recover from errors and enables timely error correction. Experiments on three interactive environments show that Agent-R effectively equips agents to correct erroneous actions while avoiding loops, achieving superior performance compared to baseline methods (+5.59%).