NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos

Yuxue Yang, Lue Fan, Ziqi Shi, Junran Peng, Feng Wang, Zhaoxiang Zhang

2026-01-05

NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos

Summary

This paper introduces NeoVerse, a new system that can create detailed 4D models of the world from videos, allowing it to reconstruct scenes, predict how things will move, and be used for other applications.

What's the problem?

Existing methods for building 4D world models are often difficult to use with everyday videos. They usually require a lot of high-quality footage taken from multiple angles, or they need a lot of complicated preparation before training can even begin. This makes them hard to scale up and apply to real-world situations where you only have a single camera view.

What's the solution?

NeoVerse solves this by focusing on building a system that works well with regular, single-camera videos. It does this through a few key ideas: it doesn't need information about the camera's position, it can quickly simulate how things might look as they change, and it combines these techniques in a way that works well together. This allows NeoVerse to create accurate 4D models without needing special data or a lot of pre-processing.

Why it matters?

NeoVerse is important because it makes 4D world modeling more accessible and practical. By working with standard videos, it opens up possibilities for using this technology in a wider range of applications, like robotics, virtual reality, and creating realistic simulations, and it performs as well or better than current state-of-the-art methods.

Abstract

In this paper, we propose NeoVerse, a versatile 4D world model that is capable of 4D reconstruction, novel-trajectory video generation, and rich downstream applications. We first identify a common limitation of scalability in current 4D world modeling methods, caused either by expensive and specialized multi-view 4D data or by cumbersome training pre-processing. In contrast, our NeoVerse is built upon a core philosophy that makes the full pipeline scalable to diverse in-the-wild monocular videos. Specifically, NeoVerse features pose-free feed-forward 4D reconstruction, online monocular degradation pattern simulation, and other well-aligned techniques. These designs empower NeoVerse with versatility and generalization to various domains. Meanwhile, NeoVerse achieves state-of-the-art performance in standard reconstruction and generation benchmarks. Our project page is available at https://neoverse-4d.github.io

View Paper