LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos
Chin-Yang Lin, Cheng Sun, Fu-En Yang, Min-Hung Chen, Yen-Yu Lin, Yu-Lun Liu
2025-08-20

Summary
This paper introduces LongSplat, a new technique for creating 3D scenes from long videos where the camera moves around a lot and we don't know exactly where it is. It's designed to be better than current methods for these tricky situations.
What's the problem?
When trying to create a 3D representation of a large scene from a long video with shaky or unpredictable camera movements, existing methods often struggle. They tend to get lost, meaning the camera's position isn't tracked accurately, the initial 3D shapes are wrong, and the computer runs out of memory because the scene is so big.
What's the solution?
LongSplat tackles these problems by cleverly combining pose estimation and 3D scene building. It simultaneously figures out where the camera was at each moment and builds the 3D scene using something called 3D Gaussian Splatting. It does this by first refining the camera poses and the 3D data together, which helps avoid mistakes. Then, it uses a smart way to group points in space, making it more efficient. This approach helps it handle the large scale and uncertain camera movements much better.
Why it matters?
This work is important because it allows us to create realistic 3D models of complex environments from everyday videos, even when the camera isn't perfectly controlled. This could be used for things like virtual reality, augmented reality, or even just creating better digital twins of real-world places.
Abstract
LongSplat addresses critical challenges in novel view synthesis (NVS) from casually captured long videos characterized by irregular camera motion, unknown camera poses, and expansive scenes. Current methods often suffer from pose drift, inaccurate geometry initialization, and severe memory limitations. To address these issues, we introduce LongSplat, a robust unposed 3D Gaussian Splatting framework featuring: (1) Incremental Joint Optimization that concurrently optimizes camera poses and 3D Gaussians to avoid local minima and ensure global consistency; (2) a robust Pose Estimation Module leveraging learned 3D priors; and (3) an efficient Octree Anchor Formation mechanism that converts dense point clouds into anchors based on spatial density. Extensive experiments on challenging benchmarks demonstrate that LongSplat achieves state-of-the-art results, substantially improving rendering quality, pose accuracy, and computational efficiency compared to prior approaches. Project page: https://linjohnss.github.io/longsplat/