Enhancing Object Detection with Privileged Information: A Model-Agnostic Teacher-Student Approach
Matthias Bartolo, Dylan Seychell, Gabriel Hili, Matthew Montebello, Carl James Debono, Saviour Formosa, Konstantinos Makantasis
2026-01-09
Summary
This paper explores a way to make object detection, which is how computers 'see' objects in images, more accurate by using extra information during training that isn't available when the system is actually used.
What's the problem?
Object detection systems can struggle with accuracy, especially when dealing with complex scenes or objects that are hard to distinguish. Current systems often only use the image itself to learn, missing out on potentially helpful information that could make them better at identifying objects. Imagine trying to find something in a dark room versus having a flashlight – the flashlight is like the extra information this paper tries to provide.
What's the solution?
The researchers used a technique called 'Learning Using Privileged Information,' or LUPI. Essentially, they created a 'teacher' model that *does* have access to extra information like detailed outlines of objects, where the most important parts of an image are, or even depth information. This teacher model then guides a 'student' model, which only sees the regular image, to learn more effectively. They tested this approach with five different object detection systems and on several datasets, including images taken from drones looking for litter.
Why it matters?
This research is important because it shows a way to significantly improve object detection accuracy without making the systems more complex or slower when they're actually used. This is especially useful for applications where computing power is limited, like on drones or mobile devices, or in situations where reliable object detection is crucial, like self-driving cars or environmental monitoring.
Abstract
This paper investigates the integration of the Learning Using Privileged Information (LUPI) paradigm in object detection to exploit fine-grained, descriptive information available during training but not at inference. We introduce a general, model-agnostic methodology for injecting privileged information-such as bounding box masks, saliency maps, and depth cues-into deep learning-based object detectors through a teacher-student architecture. Experiments are conducted across five state-of-the-art object detection models and multiple public benchmarks, including UAV-based litter detection datasets and Pascal VOC 2012, to assess the impact on accuracy, generalization, and computational efficiency. Our results demonstrate that LUPI-trained students consistently outperform their baseline counterparts, achieving significant boosts in detection accuracy with no increase in inference complexity or model size. Performance improvements are especially marked for medium and large objects, while ablation studies reveal that intermediate weighting of teacher guidance optimally balances learning from privileged and standard inputs. The findings affirm that the LUPI framework provides an effective and practical strategy for advancing object detection systems in both resource-constrained and real-world settings.