< Explain other AI papers

VeGaS: Video Gaussian Splatting

Weronika Smolak-Dyżewska, Dawid Malarz, Kornel Howil, Jan Kaczmarczyk, Marcin Mazur, Przemysław Spurek

2024-11-19

VeGaS: Video Gaussian Splatting

Summary

This paper introduces VeGaS, a new method for video editing that uses Gaussian splatting to allow realistic modifications of video data.

What's the problem?

While existing techniques for video editing using Implicit Neural Representations (INRs) can compress video data effectively, they are not suitable for editing because they lack the flexibility needed to make detailed changes. Current methods like Video Gaussian Representation (VGR) can encode videos as 3D Gaussians but are limited in the types of edits they can perform.

What's the solution?

To overcome these limitations, the authors developed VeGaS, which utilizes a new type of Gaussian distribution called Folded-Gaussian distributions. This allows the model to better capture the complex changes that occur in video streams. VeGaS models consecutive frames using 2D Gaussians, enabling more realistic and detailed modifications. The authors conducted experiments showing that VeGaS performs better than existing methods in reconstructing video frames and allows for more realistic edits.

Why it matters?

This research is significant because it enhances the capabilities of video editing technology, making it easier for creators to modify videos in realistic ways. By improving how videos are processed and edited, VeGaS can be beneficial in various fields such as filmmaking, animation, and content creation, allowing for more creative expression and higher-quality outputs.

Abstract

Implicit Neural Representations (INRs) employ neural networks to approximate discrete data as continuous functions. In the context of video data, such models can be utilized to transform the coordinates of pixel locations along with frame occurrence times (or indices) into RGB color values. Although INRs facilitate effective compression, they are unsuitable for editing purposes. One potential solution is to use a 3D Gaussian Splatting (3DGS) based model, such as the Video Gaussian Representation (VGR), which is capable of encoding video as a multitude of 3D Gaussians and is applicable for numerous video processing operations, including editing. Nevertheless, in this case, the capacity for modification is constrained to a limited set of basic transformations. To address this issue, we introduce the Video Gaussian Splatting (VeGaS) model, which enables realistic modifications of video data. To construct VeGaS, we propose a novel family of Folded-Gaussian distributions designed to capture nonlinear dynamics in a video stream and model consecutive frames by 2D Gaussians obtained as respective conditional distributions. Our experiments demonstrate that VeGaS outperforms state-of-the-art solutions in frame reconstruction tasks and allows realistic modifications of video data. The code is available at: https://github.com/gmum/VeGaS.