What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization

Xavier Thomas, Deepti Ghadiyaram

2025-03-11

What's in a Latent? Leveraging Diffusion Latent Space for Domain
Generalization

Summary

This paper talks about a new way to help AI models work better on new types of data (like photos from different cameras or medical scans from new hospitals) by using hidden patterns learned from image-generating AI models called diffusion models.

What's the problem?

AI models often struggle with new data that looks different from what they were trained on, like a self-driving car failing to recognize pedestrians in foggy weather if it was only trained on sunny days.

What's the solution?

The method uses diffusion models (which create images) to find hidden 'styles' in data (like lighting or texture differences) and teaches AI to focus on these patterns. This helps models adapt to new data without needing labels for every possible scenario.

Why it matters?

This makes AI more reliable in real-world situations where data constantly changes, like healthcare or robotics, without needing costly retraining. It also helps smaller models compete with bigger ones by using smarter training tricks.

Abstract

Domain Generalization aims to develop models that can generalize to novel and unseen data distributions. In this work, we study how model architectures and pre-training objectives impact feature richness and propose a method to effectively leverage them for domain generalization. Specifically, given a pre-trained feature space, we first discover latent domain structures, referred to as pseudo-domains, that capture domain-specific variations in an unsupervised manner. Next, we augment existing classifiers with these complementary pseudo-domain representations making them more amenable to diverse unseen test domains. We analyze how different pre-training feature spaces differ in the domain-specific variances they capture. Our empirical studies reveal that features from diffusion models excel at separating domains in the absence of explicit domain labels and capture nuanced domain-specific information. On 5 datasets, we show that our very simple framework improves generalization to unseen domains by a maximum test accuracy improvement of over 4% compared to the standard baseline Empirical Risk Minimization (ERM). Crucially, our method outperforms most algorithms that access domain labels during training.

View Paper