< Explain other AI papers

World Craft: Agentic Framework to Create Visualizable Worlds via Text

Jianwen Sun, Yukang Feng, Kaining Ying, Chuanhao Li, Zizhen Li, Fanrui Zhang, Jiaxin Ai, Yifan Chang, Yu Dai, Yifei Huang, Kaipeng Zhang

2026-01-28

World Craft: Agentic Framework to Create Visualizable Worlds via Text

Summary

This paper introduces World Craft, a new system that lets people easily create virtual worlds for AI agents, like the kind used in AI Town, just by describing what they want in plain language.

What's the problem?

Currently, building these kinds of interactive worlds requires programming knowledge, which makes it difficult for most people to customize them. It's hard for someone without coding skills to design a visual environment for AI agents to interact with and behave in.

What's the solution?

World Craft tackles this by using two main parts. First, 'World Scaffold' provides a standardized way to build the basic structure of a game-like scene. Second, 'World Guild' uses multiple AI agents to understand a user’s general ideas and then automatically create the detailed environment layout and objects needed for World Scaffold. They also created a special dataset to help the system learn how to arrange things in a realistic way and improve its accuracy. The system was tested against other AI tools and performed better at building scenes and understanding what the user intended.

Why it matters?

This work is important because it makes creating these AI worlds much more accessible to everyone, not just programmers. This 'democratization' of environment creation opens up possibilities for more people to explore and experiment with AI agents in customized settings, benefiting both entertainment and research.

Abstract

Large Language Models (LLMs) motivate generative agent simulation (e.g., AI Town) to create a ``dynamic world'', holding immense value across entertainment and research. However, for non-experts, especially those without programming skills, it isn't easy to customize a visualizable environment by themselves. In this paper, we introduce World Craft, an agentic world creation framework to create an executable and visualizable AI Town via user textual descriptions. It consists of two main modules, World Scaffold and World Guild. World Scaffold is a structured and concise standardization to develop interactive game scenes, serving as an efficient scaffolding for LLMs to customize an executable AI Town-like environment. World Guild is a multi-agent framework to progressively analyze users' intents from rough descriptions, and synthesizes required structured contents (\eg environment layout and assets) for World Scaffold . Moreover, we construct a high-quality error-correction dataset via reverse engineering to enhance spatial knowledge and improve the stability and controllability of layout generation, while reporting multi-dimensional evaluation metrics for further analysis. Extensive experiments demonstrate that our framework significantly outperforms existing commercial code agents (Cursor and Antigravity) and LLMs (Qwen3 and Gemini-3-Pro). in scene construction and narrative intent conveyance, providing a scalable solution for the democratization of environment creation.