Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models

Minghao Fu, Guo-Hua Wang, Tianyu Cui, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang

2025-11-11

Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models

Summary

This paper focuses on improving how well AI image generators create pictures that people actually like, building on a technique called Direct Preference Optimization (DPO).

What's the problem?

When trying to make AI image generators better by directly telling them which images people prefer, simply increasing the difference in 'score' between preferred and disliked images doesn't always work. In fact, it can sometimes make *both* the good and bad images worse, because the process of pushing the disliked images away can unintentionally mess up the good ones too. The AI can start 'forgetting' how to create realistic images while trying to avoid making the disliked ones.

What's the solution?

The researchers developed a new method called Diffusion-SDPO. It's a smarter way to update the AI's settings. Instead of aggressively changing the settings for disliked images, it carefully scales those changes based on how similar they are to the changes needed for the preferred images. This ensures the good images don't get worse while still trying to improve the bad ones, guaranteeing the preferred image quality doesn't decrease with each step of improvement. It’s a relatively simple adjustment that doesn’t require a lot of extra computing power.

Why it matters?

This research is important because it makes AI image generation more reliable and effective. By preventing the 'good' images from being ruined during the learning process, it leads to consistently better results, as shown by improvements in automated tests measuring image quality, how much they match the text prompt, and how aesthetically pleasing they are. This means AI can create images that are more aligned with what people actually want.

Abstract

Text-to-image diffusion models deliver high-quality images, yet aligning them with human preferences remains challenging. We revisit diffusion-based Direct Preference Optimization (DPO) for these models and identify a critical pathology: enlarging the preference margin does not necessarily improve generation quality. In particular, the standard Diffusion-DPO objective can increase the reconstruction error of both winner and loser branches. Consequently, degradation of the less-preferred outputs can become sufficiently severe that the preferred branch is also adversely affected even as the margin grows. To address this, we introduce Diffusion-SDPO, a safeguarded update rule that preserves the winner by adaptively scaling the loser gradient according to its alignment with the winner gradient. A first-order analysis yields a closed-form scaling coefficient that guarantees the error of the preferred output is non-increasing at each optimization step. Our method is simple, model-agnostic, broadly compatible with existing DPO-style alignment frameworks and adds only marginal computational overhead. Across standard text-to-image benchmarks, Diffusion-SDPO delivers consistent gains over preference-learning baselines on automated preference, aesthetic, and prompt alignment metrics. Code is publicly available at https://github.com/AIDC-AI/Diffusion-SDPO.

View Paper