DOEI: Dual Optimization of Embedding Information for Attention-Enhanced Class Activation Maps

Hongjie Zhu, Zeyu Zhang, Guansong Pang, Xu Wang, Shimin Wen, Yu Bai, Daji Ergu, Ying Cai, Yang Zhao

2025-02-27

DOEI: Dual Optimization of Embedding Information for Attention-Enhanced
Class Activation Maps

Summary

This paper talks about a new method called DOEI (Dual Optimization of Embedding Information) that improves how AI systems can understand and label parts of images when they only have limited information to work with.

What's the problem?

Current AI systems that try to label parts of images with limited information often make mistakes. They might label the wrong objects or miss parts of objects because they don't fully understand the relationship between different parts of the image and what they represent.

What's the solution?

The researchers created DOEI, which helps AI systems better understand images by improving how they process information. DOEI makes the AI focus more on the important parts of an image and less on the unimportant parts. It also combines different types of information about the image to make more accurate decisions about what's in it.

Why it matters?

This matters because it can make AI systems better at understanding images with less human input. This could be useful in many areas, like helping self-driving cars recognize objects on the road more accurately, or improving medical imaging systems to better identify diseases. By making these systems more accurate with less training data, it could make AI more useful and easier to apply in many real-world situations.

Abstract

Weakly supervised semantic segmentation (WSSS) typically utilizes limited semantic annotations to obtain initial Class Activation Maps (CAMs). However, due to the inadequate coupling between class activation responses and semantic information in high-dimensional space, the CAM is prone to object co-occurrence or under-activation, resulting in inferior recognition accuracy. To tackle this issue, we propose DOEI, Dual Optimization of Embedding Information, a novel approach that reconstructs embedding representations through semantic-aware attention weight matrices to optimize the expression capability of embedding information. Specifically, DOEI amplifies tokens with high confidence and suppresses those with low confidence during the class-to-patch interaction. This alignment of activation responses with semantic information strengthens the propagation and decoupling of target features, enabling the generated embeddings to more accurately represent target features in high-level semantic space. In addition, we propose a hybrid-feature alignment module in DOEI that combines RGB values, embedding-guided features, and self-attention weights to increase the reliability of candidate tokens. Comprehensive experiments show that DOEI is an effective plug-and-play module that empowers state-of-the-art visual transformer-based WSSS models to significantly improve the quality of CAMs and segmentation performance on popular benchmarks, including PASCAL VOC (+3.6%, +1.5%, +1.2% mIoU) and MS COCO (+1.2%, +1.6% mIoU). Code will be available at https://github.com/AIGeeksGroup/DOEI.

View Paper