Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data
Cherie Ho, Jiaye Zou, Omar Alama, Sai Mitheran Jagadesh Kumar, Benjamin Chiang, Taneesh Gupta, Chen Wang, Nikhil Keetha, Katia Sycara, Sebastian Scherer
2024-07-13

Summary
This paper presents Map It Anywhere (MIA), a new system designed to create detailed Bird's Eye View (BEV) maps using publicly available data. These maps are important for helping ground robots navigate effectively.
What's the problem?
Current methods for creating BEV maps from First-Person View (FPV) images are limited because they usually only work well in small areas that have been specifically captured by existing datasets. This means they can't easily adapt to new environments or larger regions, which is a significant drawback for applications like autonomous vehicles.
What's the solution?
To solve this problem, the authors developed MIA, a data engine that collects and organizes map prediction data from large crowd-sourced platforms like Mapillary and OpenStreetMap. By using these platforms, MIA can automatically gather a massive dataset of 1.2 million pairs of FPV images and BEV maps that cover a wide range of locations and conditions. The authors also trained a simple model on this data to predict BEV maps effectively. Their evaluations showed that MIA's dataset significantly improves the performance of BEV map predictions compared to existing datasets.
Why it matters?
This research is important because it demonstrates how to leverage large-scale public data to improve the accuracy and usability of BEV maps for robot navigation. By making it easier to create detailed maps from diverse sources, MIA has the potential to enhance autonomous navigation systems, making them more reliable and effective in various environments.
Abstract
Top-down Bird's Eye View (BEV) maps are a popular representation for ground robot navigation due to their richness and flexibility for downstream tasks. While recent methods have shown promise for predicting BEV maps from First-Person View (FPV) images, their generalizability is limited to small regions captured by current autonomous vehicle-based datasets. In this context, we show that a more scalable approach towards generalizable map prediction can be enabled by using two large-scale crowd-sourced mapping platforms, Mapillary for FPV images and OpenStreetMap for BEV semantic maps. We introduce Map It Anywhere (MIA), a data engine that enables seamless curation and modeling of labeled map prediction data from existing open-source map platforms. Using our MIA data engine, we display the ease of automatically collecting a dataset of 1.2 million pairs of FPV images & BEV maps encompassing diverse geographies, landscapes, environmental factors, camera models & capture scenarios. We further train a simple camera model-agnostic model on this data for BEV map prediction. Extensive evaluations using established benchmarks and our dataset show that the data curated by MIA enables effective pretraining for generalizable BEV map prediction, with zero-shot performance far exceeding baselines trained on existing datasets by 35%. Our analysis highlights the promise of using large-scale public maps for developing & testing generalizable BEV perception, paving the way for more robust autonomous navigation.