Scaling Agent Learning via Experience Synthesis

Zhaorun Chen, Zhuokai Zhao, Kai Zhang, Bo Liu, Qi Qi, Yifan Wu, Tarun Kalluri, Sara Cao, Yuanhao Xiong, Haibo Tong, Huaxiu Yao, Hengduo Li, Jiacheng Zhu, Xian Li, Dawn Song, Bo Li, Jason Weston, Dat Huynh

2025-11-07

Scaling Agent Learning via Experience Synthesis

Summary

This paper introduces DreamGym, a new system designed to make it easier and more efficient to train AI agents using reinforcement learning, especially when working with large language models.

What's the problem?

Training AI agents with reinforcement learning is tough because it usually requires a lot of trial and error in the real world, which is expensive and time-consuming. Getting reliable feedback for the agent is also hard, and setting up the whole system can be complex. Basically, collecting enough useful data to train these agents effectively is a major bottleneck.

What's the solution?

DreamGym solves this by creating a simulated environment where the agent can practice. Instead of directly interacting with the real world, the agent learns within this simulation, which is built on understanding how the environment works and generating realistic scenarios. It starts with some real-world data to get things going, then continuously improves the simulation based on the agent’s experiences, and even creates new challenges for the agent to learn from. This allows for a huge amount of practice without the cost and limitations of the real world.

Why it matters?

DreamGym is important because it makes reinforcement learning more practical for complex tasks. It allows agents to learn much faster and more efficiently, even in situations where real-world training is difficult or impossible. It also provides a way to ‘warm-start’ agents, meaning they can quickly adapt to the real world after being trained in the simulation, requiring far fewer real-world interactions.

Abstract

While reinforcement learning (RL) can empower large language model (LLM) agents by enabling self-improvement through interaction, its practical adoption remains challenging due to costly rollouts, limited task diversity, unreliable reward signals, and infrastructure complexity, all of which obstruct the collection of scalable experience data. To address these challenges, we introduce DreamGym, the first unified framework designed to synthesize diverse experiences with scalability in mind to enable effective online RL training for autonomous agents. Rather than relying on expensive real-environment rollouts, DreamGym distills environment dynamics into a reasoning-based experience model that derives consistent state transitions and feedback signals through step-by-step reasoning, enabling scalable agent rollout collection for RL. To improve the stability and quality of transitions, DreamGym leverages an experience replay buffer initialized with offline real-world data and continuously enriched with fresh interactions to actively support agent training. To improve knowledge acquisition, DreamGym adaptively generates new tasks that challenge the current agent policy, enabling more effective online curriculum learning. Experiments across diverse environments and agent backbones demonstrate that DreamGym substantially improves RL training, both in fully synthetic settings and in sim-to-real transfer scenarios. On non-RL-ready tasks like WebArena, DreamGym outperforms all baselines by over 30%. And in RL-ready but costly settings, it matches GRPO and PPO performance using only synthetic interactions. When transferring a policy trained purely on synthetic experiences to real-environment RL, DreamGym yields significant additional performance gains while requiring far fewer real-world interactions, providing a scalable warm-start strategy for general-purpose RL.

View Paper