AdaGaR: Adaptive Gabor Representation for Dynamic Scene Reconstruction
Jiewen Chan, Zhenjun Zhao, Yu-Lun Liu
2026-01-05
Summary
This paper introduces a new method, AdaGaR, for creating realistic 3D models of moving scenes from regular 2D videos. It aims to build these 3D scenes in a way that captures both fine details and smooth, natural motion.
What's the problem?
Existing techniques struggle to do both at the same time. Methods using simple shapes like Gaussians miss important details, while more complex methods can become unstable and create flickering or unnatural movements in the reconstructed 3D scene. A key issue is that previous approaches didn't strongly enforce that the motion should be continuous and smooth over time.
What's the solution?
AdaGaR solves this by using a new way to represent the 3D scene. It's based on modified Gaussian shapes, but these shapes can adapt to capture different levels of detail and are designed to remain stable. To ensure smooth motion, the method uses a special mathematical technique called Cubic Hermite Splines, combined with a way to keep the motion from changing too abruptly. Finally, it starts the process with a smart initialization step that uses depth estimation, tracking points in the video, and identifying the foreground objects to create a good starting point for the 3D reconstruction.
Why it matters?
This research is important because it significantly improves the quality of 3D scene reconstruction from videos. The results show AdaGaR performs better than previous methods on standard tests, and it can be used for a variety of applications like creating new frames in a video, ensuring depth maps are accurate, editing videos, and even generating views of the scene from different angles.
Abstract
Reconstructing dynamic 3D scenes from monocular videos requires simultaneously capturing high-frequency appearance details and temporally continuous motion. Existing methods using single Gaussian primitives are limited by their low-pass filtering nature, while standard Gabor functions introduce energy instability. Moreover, lack of temporal continuity constraints often leads to motion artifacts during interpolation. We propose AdaGaR, a unified framework addressing both frequency adaptivity and temporal continuity in explicit dynamic scene modeling. We introduce Adaptive Gabor Representation, extending Gaussians through learnable frequency weights and adaptive energy compensation to balance detail capture and stability. For temporal continuity, we employ Cubic Hermite Splines with Temporal Curvature Regularization to ensure smooth motion evolution. An Adaptive Initialization mechanism combining depth estimation, point tracking, and foreground masks establishes stable point cloud distributions in early training. Experiments on Tap-Vid DAVIS demonstrate state-of-the-art performance (PSNR 35.49, SSIM 0.9433, LPIPS 0.0723) and strong generalization across frame interpolation, depth consistency, video editing, and stereo view synthesis. Project page: https://jiewenchan.github.io/AdaGaR/