HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels

HunyuanWorld Team, Zhenwei Wang, Yuhao Liu, Junta Wu, Zixiao Gu, Haoyuan Wang, Xuhui Zuo, Tianyu Huang, Wenhuan Li, Sheng Zhang, Yihang Lian, Yulin Tsai, Lifu Wang, Sicong Liu, Puhua Jiang, Xianghui Yang, Dongyuan Guo, Yixuan Tang, Xinyue Mao, Jiaao Yu, Junlin Yu, Jihong Zhang

2025-07-30

HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D
Worlds from Words or Pixels

Summary

This paper talks about HunyuanWorld 1.0, a system that can create detailed and interactive 3D worlds from simple text descriptions or images. It uses a special way to organize the 3D scene into layers, making the world easy to explore and interact with.

What's the problem?

The problem is that generating 3D worlds automatically from words or pictures is difficult because the worlds need to be realistic, coherent, and allow people to move around and interact with objects. Previous methods could not fully achieve all these qualities at once.

What's the solution?

HunyuanWorld 1.0 solves this by building a layered 3D mesh that represents different parts of the scene with semantic meaning and by using panoramic proxies that help simulate a full, explorable environment. This design helps the system generate worlds that make sense, feel real, and allow users to explore and interact freely.

Why it matters?

This matters because it can make creating virtual worlds much easier and more powerful for games, education, virtual reality, and other applications where people want to experience and interact with computer-generated environments.

Abstract

HunyuanWorld 1.0 generates immersive 3D scenes from text and images using a semantically layered 3D mesh representation with panoramic world proxies, achieving state-of-the-art performance in coherence, exploration, and interactivity.

View Paper