MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second

Chenguo Lin, Yuchen Lin, Panwang Pan, Yifan Yu, Honglei Yan, Katerina Fragkiadaki, Yadong Mu

2025-07-15

MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second

Summary

This paper talks about MoVieS, a new model that creates 4D dynamic views from a single camera video in just one second. It uses special 3D shapes called Gaussian primitives to represent both how things look, their 3D shapes, and how they move, all together in one system.

What's the problem?

The problem addressed is how to quickly and accurately create dynamic 3D views that capture appearance, geometry, and motion using just monocular videos, where previous methods were slower or required complicated setups.

What's the solution?

To solve this, MoVieS uses a fast feed-forward network that links each pixel in the video to 3D Gaussian primitives, which move over time. It predicts depth, appearance, and motion in a unified way using a large-scale pretrained model with attention mechanisms, enabling rapid and coherent 4D scene reconstruction and novel view synthesis.

Why it matters?

This matters because it speeds up dynamic 4D view synthesis by a large margin while maintaining accuracy, enabling applications like tracking moving objects and estimating scene flow without needing lots of specialized training data. It advances how we understand and recreate complex 3D scenes from simple video inputs.

Abstract

MoVieS synthesizes 4D dynamic novel views from monocular videos using pixel-aligned Gaussian primitives, enabling unified appearance, geometry, and motion modeling within a single framework.

View Paper