Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space

Yifan Zhou, Zeqi Xiao, Shuai Yang, Xingang Pan

2025-03-13

Alias-Free Latent Diffusion Models:Improving Fractional Shift
Equivariance of Diffusion Latent Space

Summary

This paper talks about fixing AI image generators that glitch when making small adjustments, like moving objects slightly, by teaching them to handle tiny shifts smoothly.

What's the problem?

Current AI image tools create weird visual errors (like jagged lines) when you try to move or edit parts of an image by less than a pixel, making edits look messy.

What's the solution?

The researchers rebuilt the AI’s core parts to handle sub-pixel movements correctly, adding special filters and tweaking how it focuses on image details to prevent glitches.

Why it matters?

This makes AI image editing more reliable for tasks like video effects or photo touch-ups, where small changes need to look natural and professional.

Abstract

Latent Diffusion Models (LDMs) are known to have an unstable generation process, where even small perturbations or shifts in the input noise can lead to significantly different outputs. This hinders their applicability in applications requiring consistent results. In this work, we redesign LDMs to enhance consistency by making them shift-equivariant. While introducing anti-aliasing operations can partially improve shift-equivariance, significant aliasing and inconsistency persist due to the unique challenges in LDMs, including 1) aliasing amplification during VAE training and multiple U-Net inferences, and 2) self-attention modules that inherently lack shift-equivariance. To address these issues, we redesign the attention modules to be shift-equivariant and propose an equivariance loss that effectively suppresses the frequency bandwidth of the features in the continuous domain. The resulting alias-free LDM (AF-LDM) achieves strong shift-equivariance and is also robust to irregular warping. Extensive experiments demonstrate that AF-LDM produces significantly more consistent results than vanilla LDM across various applications, including video editing and image-to-image translation. Code is available at: https://github.com/SingleZombie/AFLDM

View Paper