Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning

Siyuan Li, Juanxi Tian, Zedong Wang, Luyuan Zhang, Zicheng Liu, Weiyang Jin, Yang Liu, Baigui Sun, Stan Z. Li

2024-10-10

Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning

Summary

This paper explores the relationship between the backbone models (the core structure of neural networks) and optimizers (the algorithms that adjust the learning process) in visual representation learning, introducing a concept called backbone-optimizer coupling bias (BOCB).

What's the problem?

In visual representation learning, different backbone models (like CNNs and ViTs) often perform better with specific optimizers. However, this interdependence can lead to biases that affect how well models learn and adapt. Without understanding these relationships, researchers might struggle to choose the right combinations of backbones and optimizers, which can limit the effectiveness of their models.

What's the solution?

The authors investigate how different backbone models interact with various optimizers and introduce the concept of BOCB. They conduct experiments to analyze how this coupling affects model training and performance. By examining different backbone-optimizer pairs, they provide insights into which combinations work best together, helping to guide future research and development in this area.

Why it matters?

This research is important because it sheds light on a previously overlooked aspect of model training in visual representation learning. By understanding how backbones and optimizers influence each other, researchers can make better choices when designing AI systems, leading to more robust and effective models for tasks like image recognition and processing.

Abstract

This paper delves into the interplay between vision backbones and optimizers, unvealing an inter-dependent phenomenon termed \textbf{backbone-optimizer coupling bias} (BOCB). We observe that canonical CNNs, such as VGG and ResNet, exhibit a marked co-dependency with SGD families, while recent architectures like ViTs and ConvNeXt share a tight coupling with the adaptive learning rate ones. We further show that BOCB can be introduced by both optimizers and certain backbone designs and may significantly impact the pre-training and downstream fine-tuning of vision models. Through in-depth empirical analysis, we summarize takeaways on recommended optimizers and insights into robust vision backbone architectures. We hope this work can inspire the community to question long-held assumptions on backbones and optimizers, stimulate further explorations, and thereby contribute to more robust vision systems. The source code and models are publicly available at https://bocb-ai.github.io/.

View Paper