Adapting Vision Foundation Models for Robust Cloud Segmentation in Remote Sensing Images

Xuechao Zou, Shun Zhang, Kai Li, Shiying Wang, Junliang Xing, Lei Jin, Congyan Lang, Pin Tao

2024-11-25

Adapting Vision Foundation Models for Robust Cloud Segmentation in Remote Sensing Images

Summary

This paper discusses a new method called Cloud-Adapter, which improves the accuracy of cloud segmentation in remote sensing images by adapting existing vision foundation models without needing extensive retraining.

What's the problem?

Cloud segmentation is important for interpreting remote sensing images, but it can be challenging because existing methods often struggle to accurately identify clouds in various conditions. Traditional models require a lot of training data and can be inefficient, which makes it hard to achieve high accuracy in cloud detection.

What's the solution?

The authors present Cloud-Adapter, a method that uses a pre-trained vision foundation model (VFM) and adds a lightweight module to enhance its ability to segment clouds. This approach keeps the original model's parameters frozen, meaning it doesn't need to be retrained from scratch. Instead, Cloud-Adapter uses a convolutional neural network to extract important features from the images and then refines the segmentation using these features. This allows the system to perform well with only a small amount of additional training data.

Why it matters?

This research is significant because it demonstrates how existing models can be adapted to improve their performance on specific tasks like cloud segmentation without needing extensive resources. By enhancing cloud detection capabilities, this method can lead to better analysis of remote sensing data, which is crucial for applications in environmental monitoring, agriculture, and disaster response.

Abstract

Cloud segmentation is a critical challenge in remote sensing image interpretation, as its accuracy directly impacts the effectiveness of subsequent data processing and analysis. Recently, vision foundation models (VFM) have demonstrated powerful generalization capabilities across various visual tasks. In this paper, we present a parameter-efficient adaptive approach, termed Cloud-Adapter, designed to enhance the accuracy and robustness of cloud segmentation. Our method leverages a VFM pretrained on general domain data, which remains frozen, eliminating the need for additional training. Cloud-Adapter incorporates a lightweight spatial perception module that initially utilizes a convolutional neural network (ConvNet) to extract dense spatial representations. These multi-scale features are then aggregated and serve as contextual inputs to an adapting module, which modulates the frozen transformer layers within the VFM. Experimental results demonstrate that the Cloud-Adapter approach, utilizing only 0.6% of the trainable parameters of the frozen backbone, achieves substantial performance gains. Cloud-Adapter consistently attains state-of-the-art (SOTA) performance across a wide variety of cloud segmentation datasets from multiple satellite sources, sensor series, data processing levels, land cover scenarios, and annotation granularities. We have released the source code and pretrained models at https://github.com/XavierJiezou/Cloud-Adapter to support further research.

View Paper