Training for X-Ray Vision: Amodal Segmentation, Amodal Content Completion, and View-Invariant Object Representation from Multi-Camera Video

Alexander Moore, Amar Saini, Kylie Cancilla, Doug Poland, Carmen Carrano

2025-07-02

Training for X-Ray Vision: Amodal Segmentation, Amodal Content
Completion, and View-Invariant Object Representation from Multi-Camera Video

Summary

This paper talks about Training for X-Ray Vision, which focuses on amodal segmentation and content completion using videos taken from multiple cameras. The goal is to help AI systems understand the full shape and content of objects even when parts are hidden or blocked from view.

What's the problem?

The problem is that in real life, objects are often partly hidden behind other things, making it hard for AI to see and understand their full shape and details. Standard computer vision methods usually only recognize the visible parts, missing important information about the hidden parts.

What's the solution?

The researchers created MOVi-MC-AC, a large dataset that includes videos from multiple camera angles along with detailed labels of both visible and hidden parts of objects. This dataset helps train models to learn how to predict the full shapes of objects and complete any missing content, improving their ability to understand objects from different viewpoints.

Why it matters?

This matters because better understanding of hidden parts of objects can improve AI in areas like robotics, self-driving cars, and surveillance, where knowing the complete shape and details of objects is important for making safe and smart decisions.

Abstract

MOVi-MC-AC is a large amodal segmentation and content completion dataset featuring multi-camera perspectives and ground-truth amodal content labels.

View Paper