Prithvi-Complimentary Adaptive Fusion Encoder (CAFE): unlocking full-potential for flood inundation mapping

Saurabh Kaushik, Lalit Maurya, Beth Tellman

2026-01-06

Prithvi-Complimentary Adaptive Fusion Encoder (CAFE): unlocking full-potential for flood inundation mapping

Summary

This paper introduces a new method, Prithvi-CAFE, to improve the accuracy of mapping floods using satellite imagery. It builds upon existing 'Geo-Foundation Models' which are good at many image tasks, but struggle with the specific details needed for accurate flood mapping.

What's the problem?

While large, pre-trained models called Geo-Foundation Models are generally effective for tasks like identifying objects in images, they don't perform as well as simpler models like U-Net when it comes to flood mapping. This is because flood maps require recognizing very specific, local patterns in the imagery that these larger models miss, even though they are good at seeing the bigger picture.

What's the solution?

The researchers created Prithvi-CAFE, which combines a powerful pre-trained model (Prithvi) with a more traditional CNN (Convolutional Neural Network) approach. The CNN part is specifically designed to pick up those important local details using something called 'Convolutional Attention Modules'. Essentially, it's like giving the big model a 'local detail booster' to improve its accuracy. They also designed it to be easily adaptable to new data through a technique called 'adapter' fine-tuning.

Why it matters?

This work is important because it shows how to improve flood mapping, which is crucial for disaster response and mitigation. Prithvi-CAFE achieves better results than existing methods on multiple datasets, meaning more accurate flood maps can be created. This is especially valuable in situations where detailed, localized information is critical for effective action, and it suggests a good way to combine the strengths of large pre-trained models with the precision of more focused techniques.

Abstract

Geo-Foundation Models (GFMs), have proven effective in diverse downstream applications, including semantic segmentation, classification, and regression tasks. However, in case of flood mapping using Sen1Flood11 dataset as a downstream task, GFMs struggles to outperform the baseline U-Net, highlighting model's limitation in capturing critical local nuances. To address this, we present the Prithvi-Complementary Adaptive Fusion Encoder (CAFE), which integrate Prithvi GFM pretrained encoder with a parallel CNN residual branch enhanced by Convolutional Attention Modules (CAM). Prithvi-CAFE enables fast and efficient fine-tuning through adapters in Prithvi and performs multi-scale, multi-level fusion with CNN features, capturing critical local details while preserving long-range dependencies. We achieve state-of-the-art results on two comprehensive flood mapping datasets: Sen1Flood11 and FloodPlanet. On Sen1Flood11 test data, Prithvi-CAFE (IoU 83.41) outperforms the original Prithvi (IoU 82.50) and other major GFMs (TerraMind 82.90, DOFA 81.54, spectralGPT: 81.02). The improvement is even more pronounced on the hold-out test site, where Prithvi-CAFE achieves an IoU of 81.37 compared to the baseline U-Net (70.57) and original Prithvi (72.42). On FloodPlanet, Prithvi-CAFE also surpasses the baseline U-Net and other GFMs, achieving an IoU of 64.70 compared to U-Net (60.14), Terramind (62.33), DOFA (59.15) and Prithvi 2.0 (61.91). Our proposed simple yet effective Prithvi-CAFE demonstrates strong potential for improving segmentation tasks where multi-channel and multi-modal data provide complementary information and local details are critical. The code is released on https://github.com/Sk-2103/Prithvi-CAFE{Prithvi-CAFE Github}

View Paper