Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

Meng Chu, Xuan Billy Zhang, Kevin Qinghong Lin, Lingdong Kong, Jize Zhang, Teng Tu, Weijian Ma, Ziqi Huang, Senqiao Yang, Wei Huang, Yeying Jin, Zhefan Rao, Jinhui Ye, Xinyu Lin, Xichen Zhang, Qisheng Hu, Shuai Yang, Leyang Shen, Wei Chow, Yifei Dong, Fengyi Wu, Quanyu Long

2026-04-27

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

Summary

This paper is about how AI systems need to get better at understanding and predicting how the world around them works, not just responding to immediate situations, to truly accomplish complex goals.

What's the problem?

Currently, AI struggles with tasks that require planning and understanding consequences over time, like manipulating objects or navigating software. The idea of a 'world model' – how an AI represents its environment – is used differently by different researchers, making it hard to compare approaches and build truly intelligent systems. Essentially, AI needs to move beyond just predicting the *next* thing that will happen to understanding the underlying rules and being able to simulate future scenarios.

What's the solution?

The researchers created a way to categorize world models based on two main ideas: how capable they are (from simple prediction to full simulation and self-improvement) and what kind of world they're modeling (physical laws, digital systems, social interactions, or scientific principles). They then looked at over 400 existing AI projects to see where they fit into this framework, identifying strengths, weaknesses, and common problems. They also suggest better ways to test these models and provide guidance for future development.

Why it matters?

This work is important because it provides a common language and roadmap for building more sophisticated AI. By clarifying what a 'world model' actually *is* and how different approaches compare, it can help researchers collaborate more effectively and accelerate progress towards AI that can not only react to the world but also understand, predict, and even change it.

Abstract

As AI systems move from generating text to accomplishing goals through sustained interaction, the ability to model environment dynamics becomes a central bottleneck. Agents that manipulate objects, navigate software, coordinate with others, or design experiments require predictive environment models, yet the term world model carries different meanings across research communities. We introduce a "levels x laws" taxonomy organized along two axes. The first defines three capability levels: L1 Predictor, which learns one-step local transition operators; L2 Simulator, which composes them into multi-step, action-conditioned rollouts that respect domain laws; and L3 Evolver, which autonomously revises its own model when predictions fail against new evidence. The second identifies four governing-law regimes: physical, digital, social, and scientific. These regimes determine what constraints a world model must satisfy and where it is most likely to fail. Using this framework, we synthesize over 400 works and summarize more than 100 representative systems spanning model-based reinforcement learning, video generation, web and GUI agents, multi-agent social simulation, and AI-driven scientific discovery. We analyze methods, failure modes, and evaluation practices across level-regime pairs, propose decision-centric evaluation principles and a minimal reproducible evaluation package, and outline architectural guidance, open problems, and governance challenges. The resulting roadmap connects previously isolated communities and charts a path from passive next-step prediction toward world models that can simulate, and ultimately reshape, the environments in which agents operate.

View Paper