OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

Yang Zhou, Yifan Wang, Jianjun Zhou, Wenzheng Chang, Haoyu Guo, Zizun Li, Kaijing Ma, Xinyue Li, Yating Wang, Haoyi Zhu, Mingyu Liu, Dingning Liu, Jiange Yang, Zhoujie Fu, Junyi Chen, Chunhua Shen, Jiangmiao Pang, Kaipeng Zhang, Tong He

2025-09-16

OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

Summary

This paper introduces OmniWorld, a new and extensive dataset created to help computers better understand and model how the world changes over time and in three dimensions.

What's the problem?

Currently, building computer systems that can truly understand and predict how things move and interact in the real world is difficult because there aren't enough good datasets available. Existing datasets are often too simple, don't cover enough different environments, or lack detailed information about how objects change position and shape over time, making it hard to train and test these systems effectively.

What's the solution?

The researchers created OmniWorld, a large dataset combining a new, realistically simulated environment called OmniWorld-Game with existing public datasets. This dataset includes lots of different types of information – like images, depth data, and object positions – and covers a wide range of scenarios with complex movements and interactions. They also created a set of challenges using this dataset to test how well current computer models perform at tasks like reconstructing 3D scenes and predicting future events.

Why it matters?

OmniWorld is important because it provides a much-needed resource for researchers working on building more intelligent machines. By offering a more realistic and comprehensive dataset, it allows for the development and evaluation of more advanced '4D world models' which are crucial for things like robotics, self-driving cars, and creating realistic virtual environments, ultimately helping computers gain a more complete understanding of the physical world.

Abstract

The field of 4D world modeling - aiming to jointly capture spatial geometry and temporal dynamics - has witnessed remarkable progress in recent years, driven by advances in large-scale generative models and multimodal learning. However, the development of truly general 4D world models remains fundamentally constrained by the availability of high-quality data. Existing datasets and benchmarks often lack the dynamic complexity, multi-domain diversity, and spatial-temporal annotations required to support key tasks such as 4D geometric reconstruction, future prediction, and camera-control video generation. To address this gap, we introduce OmniWorld, a large-scale, multi-domain, multi-modal dataset specifically designed for 4D world modeling. OmniWorld consists of a newly collected OmniWorld-Game dataset and several curated public datasets spanning diverse domains. Compared with existing synthetic datasets, OmniWorld-Game provides richer modality coverage, larger scale, and more realistic dynamic interactions. Based on this dataset, we establish a challenging benchmark that exposes the limitations of current state-of-the-art (SOTA) approaches in modeling complex 4D environments. Moreover, fine-tuning existing SOTA methods on OmniWorld leads to significant performance gains across 4D reconstruction and video generation tasks, strongly validating OmniWorld as a powerful resource for training and evaluation. We envision OmniWorld as a catalyst for accelerating the development of general-purpose 4D world models, ultimately advancing machines' holistic understanding of the physical world.

View Paper