3D-RE-GEN: 3D Reconstruction of Indoor Scenes with a Generative Framework

Tobias Sautter, Jan-Niklas Dihlmann, Hendrik P. A. Lensch

2025-12-22

3D-RE-GEN: 3D Reconstruction of Indoor Scenes with a Generative Framework

Summary

This paper introduces a new system called 3D-RE-GEN that creates 3D scenes from a single image, aiming to make the results useful for artists working on visual effects and game development.

What's the problem?

Currently, while computers can generate 3D scenes from images, the results aren't very good for artists to actually *use*. The scenes often have objects that aren't separated correctly, are positioned inaccurately, or are missing important background elements. Existing methods don't create complete, easily editable scenes that artists need for their work.

What's the solution?

3D-RE-GEN tackles this by combining several existing AI models, each specialized for a different part of the process – like finding objects, reconstructing them in 3D, and placing them in a scene. It also uses a clever trick to 'fill in' objects that are hidden from view, and importantly, builds a complete background to provide context and realism. Finally, it ensures objects are placed realistically on the ground using a special optimization technique.

Why it matters?

This work is important because it significantly improves the quality of 3D scenes generated from single images, making them much more practical for artists. By creating scenes that are both visually appealing and easily modifiable, 3D-RE-GEN can speed up workflows in industries like visual effects and game development, allowing artists to focus on creative tasks rather than tedious reconstruction.

Abstract

Recent advances in 3D scene generation produce visually appealing output, but current representations hinder artists' workflows that require modifiable 3D textured mesh scenes for visual effects and game development. Despite significant advances, current textured mesh scene reconstruction methods are far from artist ready, suffering from incorrect object decomposition, inaccurate spatial relationships, and missing backgrounds. We present 3D-RE-GEN, a compositional framework that reconstructs a single image into textured 3D objects and a background. We show that combining state of the art models from specific domains achieves state of the art scene reconstruction performance, addressing artists' requirements. Our reconstruction pipeline integrates models for asset detection, reconstruction, and placement, pushing certain models beyond their originally intended domains. Obtaining occluded objects is treated as an image editing task with generative models to infer and reconstruct with scene level reasoning under consistent lighting and geometry. Unlike current methods, 3D-RE-GEN generates a comprehensive background that spatially constrains objects during optimization and provides a foundation for realistic lighting and simulation tasks in visual effects and games. To obtain physically realistic layouts, we employ a novel 4-DoF differentiable optimization that aligns reconstructed objects with the estimated ground plane. 3D-RE-GEN~achieves state of the art performance in single image 3D scene reconstruction, producing coherent, modifiable scenes through compositional generation guided by precise camera recovery and spatial optimization.

View Paper