UniDet3D: Multi-dataset Indoor 3D Object Detection
Maksim Kolodiazhnyi, Anna Vorontsova, Matvey Skripkin, Danila Rukhovich, Anton Konushin
2024-09-10

Summary
This paper talks about UniDet3D, a new model designed to improve the detection of 3D objects in indoor environments by using multiple datasets for training.
What's the problem?
Many existing datasets for training models to detect 3D objects indoors are too small and not diverse enough. This limits the ability of these models to perform well in different indoor settings. Additionally, general approaches using foundation models do not match the performance of models specifically trained for this task.
What's the solution?
The authors propose UniDet3D, which combines data from various indoor datasets to create a more powerful and general model for detecting 3D objects. By unifying different labeling systems, they enable the model to learn effectively from multiple sources through a joint training process. The model is built on a simple transformer architecture, making it easy to use and adapt for practical applications. Their experiments show that UniDet3D significantly outperforms existing methods across six different benchmarks.
Why it matters?
This research is important because it enhances the capabilities of 3D object detection in indoor environments. By improving how models are trained with diverse data, it can lead to better performance in real-world applications like robotics and augmented reality, making technology more effective in understanding and interacting with our surroundings.
Abstract
Growing customer demand for smart solutions in robotics and augmented reality has attracted considerable attention to 3D object detection from point clouds. Yet, existing indoor datasets taken individually are too small and insufficiently diverse to train a powerful and general 3D object detection model. In the meantime, more general approaches utilizing foundation models are still inferior in quality to those based on supervised training for a specific task. In this work, we propose , a simple yet effective 3D object detection model, which is trained on a mixture of indoor datasets and is capable of working in various indoor environments. By unifying different label spaces, enables learning a strong representation across multiple datasets through a supervised joint training scheme. The proposed network architecture is built upon a vanilla transformer encoder, making it easy to run, customize and extend the prediction pipeline for practical use. Extensive experiments demonstrate that obtains significant gains over existing 3D object detection methods in 6 indoor benchmarks: ScanNet (+1.1 mAP50), ARKitScenes (+19.4 mAP25), S3DIS (+9.1 mAP50), MultiScan (+9.3 mAP50), 3RScan (+3.2 mAP50), and ScanNet++ (+2.7 mAP50). Code is available at https://github.com/filapro/unidet3d .