From Word to World: Can Large Language Models be Implicit Text-based World Models?

Yixia Li, Hongru Wang, Jiahao Qiu, Zhenfei Yin, Dongdong Zhang, Cheng Qian, Zeping Li, Pony Ma, Guanhua Chen, Heng Ji, Mengdi Wang

2025-12-25

From Word to World: Can Large Language Models be Implicit Text-based World Models?

Summary

This research explores how well large language models can be used to help AI agents learn in complex environments by essentially letting them 'imagine' and practice scenarios. It focuses on text-based worlds where the AI interacts through language.

What's the problem?

AI agents are getting better at learning through trial and error, but this takes a lot of real-world experience, which can be hard to get, expensive, or even dangerous. Simply giving an agent more data doesn't always help. While 'world models' – simulations that let agents practice – could be a solution, it's unclear if language models are good enough to create these simulations reliably and when they actually improve an agent's performance.

What's the solution?

The researchers created a way to test how well language models can act as world models. They looked at three key things: how realistic and consistent the simulated world is, how well the model handles more data and becomes more complex, and whether it actually helps the agent perform better. They tested this in several different text-based environments, using the language model to predict what happens next when the agent takes an action, generate practice scenarios, and give the agent a head start in learning.

Why it matters?

This work shows that language models *can* be useful for building world models that help AI agents learn, but it's not a guaranteed win. The benefits depend on how much the model has 'seen' of different behaviors and how complicated the environment is. This research helps us understand the limits of using language models for simulation and points the way towards making them more effective for training AI.

Abstract

Agentic reinforcement learning increasingly relies on experience-driven scaling, yet real-world environments remain non-adaptive, limited in coverage, and difficult to scale. World models offer a potential way to improve learning efficiency through simulated experience, but it remains unclear whether large language models can reliably serve this role and under what conditions they meaningfully benefit agents. We study these questions in text-based environments, which provide a controlled setting to reinterpret language modeling as next-state prediction under interaction. We introduce a three-level framework for evaluating LLM-based world models: (i) fidelity and consistency, (ii) scalability and robustness, and (iii) agent utility. Across five representative environments, we find that sufficiently trained world models maintain coherent latent state, scale predictably with data and model size, and improve agent performance via action verification, synthetic trajectory generation, and warm-starting reinforcement learning. Meanwhile, these gains depend critically on behavioral coverage and environment complexity, delineating clear boundry on when world modeling effectively supports agent learning.

View Paper