ILIAS: Instance-Level Image retrieval At Scale
Giorgos Kordopatis-Zilos, Vladan Stojnić, Anna Manko, Pavel Šuma, Nikolaos-Antonios Ypsilantis, Nikos Efthymiadis, Zakaria Laskar, Jiří Matas, Ondřej Chum, Giorgos Tolias
2025-02-18
Summary
This paper talks about ILIAS, a new dataset created to test how well AI systems can find specific objects in a large collection of images. It's like creating a really hard 'Where's Waldo?' game, but for computers instead of humans.
What's the problem?
Current datasets used to test AI's ability to find specific objects in images have some limitations. They often focus on just one type of object, like landmarks or products, and don't challenge the AI systems enough. This makes it hard to know how well these systems would work in real-world situations where they need to find all kinds of objects in all sorts of conditions.
What's the solution?
The researchers created ILIAS, which includes 1,000 different objects photographed in various challenging ways, like being partially hidden or in cluttered backgrounds. They also added 100 million 'distractor' images to make the task even harder. They carefully designed the dataset to avoid common problems in other tests, like accidentally including images that shouldn't be there. Then, they tested various AI systems on this new dataset to see how well they performed.
Why it matters?
This matters because as we develop more advanced AI systems for tasks like image search or robotics, we need ways to accurately test how well they can identify specific objects in complex, real-world situations. ILIAS provides a tough, fair test that can help researchers improve these systems, potentially leading to better image search engines, more capable robots, and other applications that rely on precise object recognition.
Abstract
This work introduces ILIAS, a new test dataset for Instance-Level Image retrieval At Scale. It is designed to evaluate the ability of current and future foundation models and retrieval techniques to recognize particular objects. The key benefits over existing datasets include large scale, domain diversity, accurate ground truth, and a performance that is far from saturated. ILIAS includes query and positive images for 1,000 object instances, manually collected to capture challenging conditions and diverse domains. Large-scale retrieval is conducted against 100 million distractor images from YFCC100M. To avoid false negatives without extra annotation effort, we include only query objects confirmed to have emerged after 2014, i.e. the compilation date of YFCC100M. An extensive benchmarking is performed with the following observations: i) models fine-tuned on specific domains, such as landmarks or products, excel in that domain but fail on ILIAS ii) learning a linear adaptation layer using multi-domain class supervision results in performance improvements, especially for vision-language models iii) local descriptors in retrieval re-ranking are still a key ingredient, especially in the presence of severe background clutter iv) the text-to-image performance of the vision-language foundation models is surprisingly close to the corresponding image-to-image case. website: https://vrg.fel.cvut.cz/ilias/