InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts

Weipeng Zhong, Peizhou Cao, Yichen Jin, Li Luo, Wenzhe Cai, Jingli Lin, Hanqing Wang, Zhaoyang Lyu, Tai Wang, Bo Dai, Xudong Xu, Jiangmiao Pang

2025-09-16

InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts

Summary

This paper introduces InternScenes, a new and very large dataset of 3D indoor scenes designed to help researchers develop more intelligent robots and AI systems that can understand and interact with the real world.

What's the problem?

Currently, datasets used to train these AI systems are often too small, lack variety, or aren't realistic enough. Existing scenes tend to be overly neat, missing common small objects you'd find in a real home, and often have objects overlapping in ways that wouldn't physically happen. This makes it hard for AI to learn how to navigate and manipulate objects in a believable environment.

What's the solution?

The researchers created InternScenes by combining three different sources: real scans of indoor spaces, computer-generated scenes, and scenes designed by artists. This resulted in around 40,000 different scenes with almost two million 3D objects, covering 15 types of rooms and 288 different kinds of objects. They specifically made sure to include lots of small items and used physics simulations to prevent objects from colliding with each other, making the scenes more realistic and usable for training AI. They also made the scenes 'simulatable', meaning AI can practice in these environments before being used in the real world.

Why it matters?

InternScenes is important because it provides a much more challenging and realistic environment for training AI. By testing AI on this dataset, researchers can see how well it performs in complex situations and push the boundaries of what's possible in areas like robot navigation and understanding how objects are arranged in a room. The dataset is also being released publicly, so other researchers can benefit from it and build upon this work.

Abstract

The advancement of Embodied AI heavily relies on large-scale, simulatable 3D scene datasets characterized by scene diversity and realistic layouts. However, existing datasets typically suffer from limitations in data scale or diversity, sanitized layouts lacking small items, and severe object collisions. To address these shortcomings, we introduce InternScenes, a novel large-scale simulatable indoor scene dataset comprising approximately 40,000 diverse scenes by integrating three disparate scene sources, real-world scans, procedurally generated scenes, and designer-created scenes, including 1.96M 3D objects and covering 15 common scene types and 288 object classes. We particularly preserve massive small items in the scenes, resulting in realistic and complex layouts with an average of 41.5 objects per region. Our comprehensive data processing pipeline ensures simulatability by creating real-to-sim replicas for real-world scans, enhances interactivity by incorporating interactive objects into these scenes, and resolves object collisions by physical simulations. We demonstrate the value of InternScenes with two benchmark applications: scene layout generation and point-goal navigation. Both show the new challenges posed by the complex and realistic layouts. More importantly, InternScenes paves the way for scaling up the model training for both tasks, making the generation and navigation in such complex scenes possible. We commit to open-sourcing the data, models, and benchmarks to benefit the whole community.

View Paper