Key Features

Depth Watertight Mesh representation
Simulated masking strategy for training data generation
Lightweight LoRA-based video diffusion adapter
High-quality, physically consistent, and temporally coherent video synthesis
Smooth camera transitions and consistent temporal dynamics
Extreme viewpoint synthesis capabilities
Efficient and scalable video synthesis
Potential applications in world generation, video production, and virtual reality

EX-4D includes a simulated masking strategy that generates effective training data from monocular videos, eliminating the need for paired multi-view datasets. This approach allows the framework to learn from a large dataset of videos and generate high-quality videos with smooth camera transitions and consistent temporal dynamics. Additionally, EX-4D uses a lightweight LoRA-based video diffusion adapter that synthesizes high-quality videos with only 1% trainable parameters, making it efficient and scalable.


EX-4D has been demonstrated to generate high-quality 360° world videos from monocular input images. The framework has been tested on various videos and has shown to produce smooth camera transitions and consistent temporal dynamics across challenging viewpoint changes. EX-4D has the potential to be used in various applications, including world generation, video production, and virtual reality. Its ability to generate high-quality videos with extreme viewpoints makes it a valuable tool for content creators and researchers.

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme
TurboType Banner

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!