I-Scene: 3D Instance Models are Implicit Generalizable Spatial Learners

Lu Ling, Yunhao Ge, Yichen Sheng, Aniket Bera

2025-12-16

I-Scene: 3D Instance Models are Implicit Generalizable Spatial Learners

Summary

This paper explores how to create more realistic and flexible 3D scenes using computers, focusing on making the system able to build scenes it hasn't specifically been trained on.

What's the problem?

Currently, computer programs that generate 3D scenes learn from a limited set of examples, meaning they struggle to create new scenes with different arrangements or objects than what they've seen before. They're basically stuck with the layouts they were taught, and can't really 'imagine' new ones.

What's the solution?

The researchers took a program already good at creating individual 3D objects and 'retrained' it to understand how objects relate to each other in a scene. Instead of showing it lots of complete scenes, they focused on teaching it the rules of spatial arrangement – things like objects being close together, supported by other objects, or arranged symmetrically. They did this even with randomly assembled scenes, proving the program could learn these spatial relationships from geometry alone, and then used a 'view-centric' approach to make the process efficient.

Why it matters?

This work is important because it shows that a computer can learn a general understanding of 3D space, rather than just memorizing specific scenes. This is a step towards creating 'foundation models' for 3D, meaning a single program could be used to understand and generate a wide variety of 3D environments, which is crucial for things like virtual reality, robotics, and game development.

Abstract

Generalization remains the central challenge for interactive 3D scene generation. Existing learning-based approaches ground spatial understanding in limited scene dataset, restricting generalization to new layouts. We instead reprogram a pre-trained 3D instance generator to act as a scene level learner, replacing dataset-bounded supervision with model-centric spatial supervision. This reprogramming unlocks the generator transferable spatial knowledge, enabling generalization to unseen layouts and novel object compositions. Remarkably, spatial reasoning still emerges even when the training scenes are randomly composed objects. This demonstrates that the generator's transferable scene prior provides a rich learning signal for inferring proximity, support, and symmetry from purely geometric cues. Replacing widely used canonical space, we instantiate this insight with a view-centric formulation of the scene space, yielding a fully feed-forward, generalizable scene generator that learns spatial relations directly from the instance model. Quantitative and qualitative results show that a 3D instance generator is an implicit spatial learner and reasoner, pointing toward foundation models for interactive 3D scene understanding and generation. Project page: https://luling06.github.io/I-Scene-project/

View Paper