Dyna-Mind: Learning to Simulate from Experience for Better AI Agents

Xiao Yu, Baolin Peng, Michel Galley, Hao Cheng, Qianhui Wu, Janardhan Kulkarni, Suman Nath, Zhou Yu, Jianfeng Gao

2025-10-13

Dyna-Mind: Learning to Simulate from Experience for Better AI Agents

Summary

This paper explores why AI models, while great at things like math and coding, struggle with tasks that require planning and interacting with environments over a long period, like using a computer or navigating the web.

What's the problem?

Current AI models are good at solving specific problems but lack the ability to 'think ahead' and consider different possibilities before acting in complex, real-world situations. Humans often mentally simulate future outcomes before making a decision, a process called 'vicarious trial and error', and AI is missing this crucial skill. This makes them perform poorly in tasks needing long-term planning and adaptation.

What's the solution?

The researchers developed a new training method called Dyna-Mind. It works in two steps: first, 'ReSim' teaches the AI to create detailed reasoning steps based on imagined scenarios built from its past experiences. This helps it predict what might happen next. Second, 'Dyna-GRPO' refines this ability by letting the AI learn from both the final results of its actions and the steps it took along the way, constantly improving its planning and decision-making.

Why it matters?

This research is important because it highlights the need for AI to be able to simulate and anticipate future events, just like humans do. By giving AI this ability, we can create more intelligent agents that can effectively handle complex tasks and navigate challenging environments, ultimately making AI more useful and adaptable in the real world.

Abstract

Reasoning models have recently shown remarkable progress in domains such as math and coding. However, their expert-level abilities in math and coding contrast sharply with their performance in long-horizon, interactive tasks such as web navigation and computer/phone-use. Inspired by literature on human cognition, we argue that current AI agents need ''vicarious trial and error'' - the capacity to mentally simulate alternative futures before acting - in order to enhance their understanding and performance in complex interactive environments. We introduce Dyna-Mind, a two-stage training framework that explicitly teaches (V)LM agents to integrate such simulation into their reasoning. In stage 1, we introduce Reasoning with Simulations (ReSim), which trains the agent to generate structured reasoning traces from expanded search trees built from real experience gathered through environment interactions. ReSim thus grounds the agent's reasoning in faithful world dynamics and equips it with the ability to anticipate future states in its reasoning. In stage 2, we propose Dyna-GRPO, an online reinforcement learning method to further strengthen the agent's simulation and decision-making ability by using both outcome rewards and intermediate states as feedback from real rollouts. Experiments on two synthetic benchmarks (Sokoban and ALFWorld) and one realistic benchmark (AndroidWorld) demonstrate that (1) ReSim effectively infuses simulation ability into AI agents, and (2) Dyna-GRPO leverages outcome and interaction-level signals to learn better policies for long-horizon, planning-intensive tasks. Together, these results highlight the central role of simulation in enabling AI agents to reason, plan, and act more effectively in the ever more challenging environments.

View Paper