LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation

Yinglin Duan, Zhengxia Zou, Tongwei Gu, Wei Jia, Zhan Zhao, Luyi Xu, Xinzhu Liu, Hao Jiang, Kang Chen, Shuang Qiu

2025-09-08

LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation

Summary

This paper introduces LatticeWorld, a new system for quickly creating realistic 3D virtual worlds using artificial intelligence.

What's the problem?

Creating detailed 3D environments for things like video games, training simulations, or testing robots is usually a slow and painstaking process done by hand. While AI is being used to help, current methods aren't efficient enough for large-scale production and often struggle to balance realism with speed.

What's the solution?

The researchers developed LatticeWorld, which uses a relatively small language model (LLaMA-2-7B) combined with a powerful game engine (Unreal Engine 5). You can give LatticeWorld text prompts or even visual examples, and it will automatically build a large, interactive 3D world with objects that behave realistically and even include characters that can interact with each other. It's designed to make the process much faster and easier.

Why it matters?

LatticeWorld significantly speeds up the creation of 3D worlds – over 90 times faster than traditional methods – without sacrificing quality. This is important because it can lower the cost and time needed to develop applications in areas like robotics, self-driving cars, and entertainment, ultimately helping bridge the gap between simulations and the real world.

Abstract

Recent research has been increasingly focusing on developing 3D world models that simulate complex real-world scenarios. World models have found broad applications across various domains, including embodied AI, autonomous driving, entertainment, etc. A more realistic simulation with accurate physics will effectively narrow the sim-to-real gap and allow us to gather rich information about the real world conveniently. While traditional manual modeling has enabled the creation of virtual 3D scenes, modern approaches have leveraged advanced machine learning algorithms for 3D world generation, with most recent advances focusing on generative methods that can create virtual worlds based on user instructions. This work explores such a research direction by proposing LatticeWorld, a simple yet effective 3D world generation framework that streamlines the industrial production pipeline of 3D environments. LatticeWorld leverages lightweight LLMs (LLaMA-2-7B) alongside the industry-grade rendering engine (e.g., Unreal Engine 5) to generate a dynamic environment. Our proposed framework accepts textual descriptions and visual instructions as multimodal inputs and creates large-scale 3D interactive worlds with dynamic agents, featuring competitive multi-agent interaction, high-fidelity physics simulation, and real-time rendering. We conduct comprehensive experiments to evaluate LatticeWorld, showing that it achieves superior accuracy in scene layout generation and visual fidelity. Moreover, LatticeWorld achieves over a 90times increase in industrial production efficiency while maintaining high creative quality compared with traditional manual production methods. Our demo video is available at https://youtu.be/8VWZXpERR18

View Paper