Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning

Zhaoyang Wang, Canwen Xu, Boyi Liu, Yite Wang, Siwei Han, Zhewei Yao, Huaxiu Yao, Yuxiong He

2026-02-11

Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning

Summary

This paper introduces a new way to create realistic, but completely computer-generated, worlds for training AI agents. These agents are the kind that can perform tasks by using different tools and interacting with their surroundings, like a virtual assistant.

What's the problem?

Training these AI agents to be really good at complex tasks is hard because it requires a lot of different scenarios for them to learn from. Getting these scenarios from the real world is expensive and can be inconsistent. Simply having the AI create its own scenarios using language models isn't reliable enough because those scenarios can change unexpectedly or not make logical sense.

What's the solution?

The researchers developed a system called Agent World Model (AWM) that automatically builds these training environments. Instead of relying on language models to *imagine* the world, AWM uses code and databases to create worlds that are consistent and predictable. Each world has a lot of tools the agent can use, and the system can track exactly what's happening in the environment. They created 1,000 of these worlds covering everyday situations.

Why it matters?

This is important because it allows researchers to train AI agents on a massive scale without the limitations of real-world data or unreliable simulations. Because the environments are built with code and databases, it’s easier to create clear goals and rewards for the agents, and the agents trained in these synthetic worlds actually perform well when tested in new, unseen situations. This means we can build more capable and reliable AI assistants.

Abstract

Recent advances in large language model (LLM) have empowered autonomous agents to perform complex tasks that require multi-turn interactions with tools and environments. However, scaling such agent training is limited by the lack of diverse and reliable environments. In this paper, we propose Agent World Model (AWM), a fully synthetic environment generation pipeline. Using this pipeline, we scale to 1,000 environments covering everyday scenarios, in which agents can interact with rich toolsets (35 tools per environment on average) and obtain high-quality observations. Notably, these environments are code-driven and backed by databases, providing more reliable and consistent state transitions than environments simulated by LLMs. Moreover, they enable more efficient agent interaction compared with collecting trajectories from realistic environments. To demonstrate the effectiveness of this resource, we perform large-scale reinforcement learning for multi-turn tool-use agents. Thanks to the fully executable environments and accessible database states, we can also design reliable reward functions. Experiments on three benchmarks show that training exclusively in synthetic environments, rather than benchmark-specific ones, yields strong out-of-distribution generalization. The code is available at https://github.com/Snowflake-Labs/agent-world-model.

View Paper