Splannequin: Freezing Monocular Mannequin-Challenge Footage with Dual-Detection Splatting

Hao-Jen Chien, Yi-Chuan Huang, Chung-Ho Wu, Wei-Lun Chao, Yu-Lun Liu

2025-12-05

Splannequin: Freezing Monocular Mannequin-Challenge Footage with Dual-Detection Splatting

Summary

This paper introduces a new technique, called Splannequin, for creating really detailed, frozen-in-time 3D scenes from regular videos, like those you might take with your phone. The goal isn't to recreate the movement, but to let you pick *any* moment in the video and see a high-quality 3D snapshot of that scene.

What's the problem?

When you try to build a 3D scene from a single video, it's tricky to get it right, especially when things move or get hidden behind other objects. This causes problems like 'ghosting' – where you see blurry trails of objects – or parts of the scene just disappearing because the camera didn't see them well enough at certain times. Existing methods struggle to create a clear, stable 3D model when the video doesn't have a lot of information about the scene's structure.

What's the solution?

Splannequin tackles this by using a technique called dynamic Gaussian splatting, which basically represents the scene as a collection of glowing points. The key is how it handles those points when they're not clearly visible. If a point is hidden or blurry, the system 'anchors' it to either a previous, well-defined position or a future, clearer position in the video. This prevents ghosting and keeps the scene stable, all without needing to change the core way the 3D scene is built. It's like giving the points a memory of where they were or where they're going.

Why it matters?

This research is important because it makes it much easier to create high-quality 3D models from everyday videos. Imagine being able to freeze a moment from a home video and walk around it in 3D! It opens up possibilities for creating virtual reality experiences, special effects, and even just preserving memories in a more immersive way. User testing showed people overwhelmingly preferred the results from Splannequin over previous methods.

Abstract

Synthesizing high-fidelity frozen 3D scenes from monocular Mannequin-Challenge (MC) videos is a unique problem distinct from standard dynamic scene reconstruction. Instead of focusing on modeling motion, our goal is to create a frozen scene while strategically preserving subtle dynamics to enable user-controlled instant selection. To achieve this, we introduce a novel application of dynamic Gaussian splatting: the scene is modeled dynamically, which retains nearby temporal variation, and a static scene is rendered by fixing the model's time parameter. However, under this usage, monocular capture with sparse temporal supervision introduces artifacts like ghosting and blur for Gaussians that become unobserved or occluded at weakly supervised timestamps. We propose Splannequin, an architecture-agnostic regularization that detects two states of Gaussian primitives, hidden and defective, and applies temporal anchoring. Under predominantly forward camera motion, hidden states are anchored to their recent well-observed past states, while defective states are anchored to future states with stronger supervision. Our method integrates into existing dynamic Gaussian pipelines via simple loss terms, requires no architectural changes, and adds zero inference overhead. This results in markedly improved visual quality, enabling high-fidelity, user-selectable frozen-time renderings, validated by a 96% user preference. Project page: https://chien90190.github.io/splannequin/

View Paper