SpectralSplats: Robust Differentiable Tracking via Spectral Moment Supervision
Avigail Cohen Rimon, Amir Mann, Mirela Ben Chen, Or Litany
2026-03-26
Summary
This paper introduces a new method called SpectralSplats to improve how computers track 3D objects in videos, specifically when using a technique called 3D Gaussian Splatting which creates very realistic images. It makes tracking more reliable, even when the initial guess of the object's position is way off.
What's the problem?
Existing methods for tracking 3D objects created with 3D Gaussian Splatting struggle when the camera's view is significantly misaligned with the object. This is because the way these methods work relies on seeing the object clearly in the image, and if it's not visible enough, the computer gets 'stuck' and can't adjust the object's position correctly – the signals used to guide the adjustment simply disappear. Essentially, the computer needs some overlap between the rendered object and the real object to know how to move it, and without that overlap, tracking fails.
What's the solution?
SpectralSplats solves this by changing *how* the computer checks if the tracking is correct. Instead of focusing on whether the pixels of the rendered object match the pixels of the real object in the image (spatial domain), it looks at patterns in the image's frequencies (frequency domain). Think of it like recognizing a song by its overall melody rather than individual notes. This allows the computer to get a signal even when the object isn't clearly visible, providing a 'global' guide for improvement. They also developed a way to gradually shift the focus back to precise spatial alignment as the tracking gets better, avoiding issues with the frequency-based approach.
Why it matters?
This is important because it makes tracking 3D objects in videos much more robust and reliable. It allows for accurate tracking even in challenging situations where traditional methods fail, opening up possibilities for applications like virtual reality, augmented reality, and special effects where realistic and stable object tracking is crucial.
Abstract
3D Gaussian Splatting (3DGS) enables real-time, photorealistic novel view synthesis, making it a highly attractive representation for model-based video tracking. However, leveraging the differentiability of the 3DGS renderer "in the wild" remains notoriously fragile. A fundamental bottleneck lies in the compact, local support of the Gaussian primitives. Standard photometric objectives implicitly rely on spatial overlap; if severe camera misalignment places the rendered object outside the target's local footprint, gradients strictly vanish, leaving the optimizer stranded. We introduce SpectralSplats, a robust tracking framework that resolves this "vanishing gradient" problem by shifting the optimization objective from the spatial to the frequency domain. By supervising the rendered image via a set of global complex sinusoidal features (Spectral Moments), we construct a global basin of attraction, ensuring that a valid, directional gradient toward the target exists across the entire image domain, even when pixel overlap is completely nonexistent. To harness this global basin without introducing periodic local minima associated with high frequencies, we derive a principled Frequency Annealing schedule from first principles, gracefully transitioning the optimizer from global convexity to precise spatial alignment. We demonstrate that SpectralSplats acts as a seamless, drop-in replacement for spatial losses across diverse deformation parameterizations (from MLPs to sparse control points), successfully recovering complex deformations even from severely misaligned initializations where standard appearance-based tracking catastrophically fails.