Vista4D: Video Reshooting with 4D Point Clouds

Kuan Heng Lin, Zhizheng Liu, Pablo Salamanca, Yash Kant, Ryan Burgert, Yuancheng Xu, Koichi Namekata, Yiwei Zhao, Bolei Zhou, Micah Goldblum, Paul Debevec, Ning Yu

2026-04-24

Vista4D: Video Reshooting with 4D Point Clouds

Summary

This paper introduces Vista4D, a new system for changing the viewpoint of a video after it's been filmed, essentially 'reshooting' it from a different angle.

What's the problem?

Existing methods for changing a video's perspective struggle when dealing with real-world videos that have moving objects. They often create blurry or distorted images, don't accurately recreate the original appearance, and have trouble following a new camera path precisely. It's hard to make a video look realistic when you're trying to change the camera angle after the fact, especially with things moving around.

What's the solution?

Vista4D solves this by creating a detailed 3D model of the scene, including how things move over time – a '4D point cloud'. It identifies what parts of the scene are static and reconstructs the dynamic parts, ensuring the original content is preserved. The system is trained using lots of different views of moving scenes to make it robust and handle imperfections in the 3D model. This allows it to realistically resynthesize the video from a new camera angle.

Why it matters?

This research is important because it opens up possibilities for editing videos in new ways. Imagine being able to expand a scene, change the camera angle to reveal hidden details, or completely recompose a video after it's already been recorded. It has practical applications in areas like visual effects and creating immersive experiences.

Abstract

We present Vista4D, a robust and flexible video reshooting framework that grounds the input video and target cameras in a 4D point cloud. Specifically, given an input video, our method re-synthesizes the scene with the same dynamics from a different camera trajectory and viewpoint. Existing video reshooting methods often struggle with depth estimation artifacts of real-world dynamic videos, while also failing to preserve content appearance and failing to maintain precise camera control for challenging new trajectories. We build a 4D-grounded point cloud representation with static pixel segmentation and 4D reconstruction to explicitly preserve seen content and provide rich camera signals, and we train with reconstructed multiview dynamic data for robustness against point cloud artifacts during real-world inference. Our results demonstrate improved 4D consistency, camera control, and visual quality compared to state-of-the-art baselines under a variety of videos and camera paths. Moreover, our method generalizes to real-world applications such as dynamic scene expansion and 4D scene recomposition. See our project page for results, code, and models: https://eyeline-labs.github.io/Vista4D

View Paper