Co-SemDepth: Fast Joint Semantic Segmentation and Depth Estimation on Aerial Images
Yara AlaaEldin, Francesca Odone
2025-03-26
Summary
This paper is about making drones better at seeing and understanding their surroundings in real-time.
What's the problem?
Drones need to understand the 3D shape and what objects are around them to navigate safely, but doing this quickly and accurately is difficult.
What's the solution?
The researchers created a new AI system that can quickly estimate both the depth and identify objects in images taken from a drone's camera.
Why it matters?
This work matters because it can help drones navigate more safely and autonomously, which is important for applications like delivery, inspection, and search and rescue.
Abstract
Understanding the geometric and semantic properties of the scene is crucial in autonomous navigation and particularly challenging in the case of Unmanned Aerial Vehicle (UAV) navigation. Such information may be by obtained by estimating depth and semantic segmentation maps of the surrounding environment and for their practical use in autonomous navigation, the procedure must be performed as close to real-time as possible. In this paper, we leverage monocular cameras on aerial robots to predict depth and semantic maps in low-altitude unstructured environments. We propose a joint deep-learning architecture that can perform the two tasks accurately and rapidly, and validate its effectiveness on MidAir and Aeroscapes benchmark datasets. Our joint-architecture proves to be competitive or superior to the other single and joint architecture methods while performing its task fast predicting 20.2 FPS on a single NVIDIA quadro p5000 GPU and it has a low memory footprint. All codes for training and prediction can be found on this link: https://github.com/Malga-Vision/Co-SemDepth