FreSca: Unveiling the Scaling Space in Diffusion Models
Chao Huang, Susan Liang, Yunlong Tang, Li Ma, Yapeng Tian, Chenliang Xu
2025-04-04
Summary
This paper is about improving how AI models edit images by better understanding how they change images based on instructions.
What's the problem?
AI models can edit images, but we don't fully understand how they use instructions to make those changes, which limits how well we can control the editing process.
What's the solution?
The researchers analyzed how AI models change images and found that different parts of the changes (high and low frequencies) are handled differently. They created a method called FreSca that lets them control these parts separately, leading to better image editing.
Why it matters?
This work matters because it gives us more control over AI image editing, which could lead to better and more realistic image manipulation.
Abstract
Diffusion models offer impressive controllability for image tasks, primarily through noise predictions that encode task-specific information and classifier-free guidance enabling adjustable scaling. This scaling mechanism implicitly defines a ``scaling space'' whose potential for fine-grained semantic manipulation remains underexplored. We investigate this space, starting with inversion-based editing where the difference between conditional/unconditional noise predictions carries key semantic information. Our core contribution stems from a Fourier analysis of noise predictions, revealing that its low- and high-frequency components evolve differently throughout diffusion. Based on this insight, we introduce FreSca, a straightforward method that applies guidance scaling independently to different frequency bands in the Fourier domain. FreSca demonstrably enhances existing image editing methods without retraining. Excitingly, its effectiveness extends to image understanding tasks such as depth estimation, yielding quantitative gains across multiple datasets.