MegaLoc: One Retrieval to Place Them All

Gabriele Berton, Carlo Masone

2025-02-25

MegaLoc: One Retrieval to Place Them All

Summary

This paper talks about MegaLoc, a new AI system that can recognize places in photos across different types of tasks, like finding landmarks or helping robots navigate

What's the problem?

Current AI systems for recognizing places in photos are usually designed for just one specific task, like finding landmarks or helping with navigation. They often don't work well when used for slightly different tasks or when they encounter unfamiliar types of images

What's the solution?

The researchers created MegaLoc by combining different methods, training techniques, and datasets from various place recognition tasks. This approach allows MegaLoc to perform well across multiple tasks instead of being limited to just one. They tested MegaLoc on different datasets and found it performed better than existing systems for various place recognition challenges

Why it matters?

This matters because having one AI system that can handle multiple place recognition tasks well makes it more versatile and useful in real-world applications. It could improve things like navigation apps, augmented reality experiences, and even help robots understand their surroundings better. The researchers have also made their code available, which allows other scientists and developers to use and improve upon their work

Abstract

Retrieving images from the same location as a given query is an important component of multiple computer vision tasks, like Visual Place Recognition, Landmark Retrieval, Visual Localization, 3D reconstruction, and SLAM. However, existing solutions are built to specifically work for one of these tasks, and are known to fail when the requirements slightly change or when they meet out-of-distribution data. In this paper we combine a variety of existing methods, training techniques, and datasets to train a retrieval model, called MegaLoc, that is performant on multiple tasks. We find that MegaLoc (1) achieves state of the art on a large number of Visual Place Recognition datasets, (2) impressive results on common Landmark Retrieval datasets, and (3) sets a new state of the art for Visual Localization on the LaMAR datasets, where we only changed the retrieval method to the existing localization pipeline. The code for MegaLoc is available at https://github.com/gmberton/MegaLoc

View Paper