Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion

Aleksandar Jevtić, Christoph Reich, Felix Wimbauer, Oliver Hahn, Christian Rupprecht, Stefan Roth, Daniel Cremers

2025-07-09

Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion

Summary

This paper talks about SceneDINO, a method that can understand and complete 3D scenes from a single image without needing any labeled examples. It learns by itself to figure out both the 3D shapes and the meaning of different parts of the scene.

What's the problem?

The problem is that most existing methods for understanding 3D scenes need a lot of human-labeled data, which is expensive and hard to get. This makes it difficult for AI to learn how to complete and interpret 3D scenes on its own.

What's the solution?

The researchers designed SceneDINO to use self-supervised learning techniques where it trains by looking at multiple views of the same scene and learns consistent 3D features. It then refines these features into meaningful groups that represent different parts of the scene, achieving accurate segmentation and 3D understanding without human labels.

Why it matters?

This matters because it helps AI understand complex 3D environments more efficiently and without relying on lots of annotated data. This can be useful for robotics, autonomous vehicles, and augmented reality, where understanding 3D space quickly and accurately is crucial.

Abstract

SceneDINO achieves state-of-the-art segmentation accuracy in unsupervised 3D scene completion by leveraging self-supervised learning and 3D feature distillation.

View Paper