EX-4D includes a simulated masking strategy that generates effective training data from monocular videos, eliminating the need for paired multi-view datasets. This approach allows the framework to learn from a large dataset of videos and generate high-quality videos with smooth camera transitions and consistent temporal dynamics. Additionally, EX-4D uses a lightweight LoRA-based video diffusion adapter that synthesizes high-quality videos with only 1% trainable parameters, making it efficient and scalable.
EX-4D has been demonstrated to generate high-quality 360° world videos from monocular input images. The framework has been tested on various videos and has shown to produce smooth camera transitions and consistent temporal dynamics across challenging viewpoint changes. EX-4D has the potential to be used in various applications, including world generation, video production, and virtual reality. Its ability to generate high-quality videos with extreme viewpoints makes it a valuable tool for content creators and researchers.