< Explain other AI papers

Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference

Xiangwei Shen, Zhimin Li, Zhantao Yang, Shiyi Zhang, Yingfang Zhang, Donghao Li, Chunyu Wang, Qinglin Lu, Yansong Tang

2025-09-10

Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference

Summary

This paper focuses on improving how we teach AI image generators, called diffusion models, to create images that people actually like. It builds on recent work that uses a 'reward' system to guide the AI, but tackles some key weaknesses in that approach.

What's the problem?

Current methods for aligning AI image generation with human preferences are slow and require a lot of computing power because they need to repeatedly refine images and calculate rewards. Also, they often need constant updates to the 'reward' system itself to maintain high quality, like getting realistic lighting or details. Essentially, it's hard to get the AI to consistently produce good images without a lot of effort and resources.

What's the solution?

The researchers developed a two-part solution. First, they created a technique called 'Direct-Align' which allows the AI to quickly reconstruct images from any stage of the generation process, avoiding the need for many slow refinement steps. Second, they introduced 'Semantic Relative Preference Optimization' (SRPO), which lets the AI adjust its understanding of 'good' images on the fly, based on simple text prompts, reducing the need for constant manual updates to the reward system. They applied these techniques to an existing image generator called FLUX.1.dev.

Why it matters?

This work is important because it makes it much more practical to create AI image generators that consistently produce high-quality, aesthetically pleasing images. By speeding up the process and reducing the need for constant tweaking, it opens the door to more accessible and efficient AI art creation, improving realism and overall image quality significantly.

Abstract

Recent studies have demonstrated the effectiveness of directly aligning diffusion models with human preferences using differentiable reward. However, they exhibit two primary challenges: (1) they rely on multistep denoising with gradient computation for reward scoring, which is computationally expensive, thus restricting optimization to only a few diffusion steps; (2) they often need continuous offline adaptation of reward models in order to achieve desired aesthetic quality, such as photorealism or precise lighting effects. To address the limitation of multistep denoising, we propose Direct-Align, a method that predefines a noise prior to effectively recover original images from any time steps via interpolation, leveraging the equation that diffusion states are interpolations between noise and target images, which effectively avoids over-optimization in late timesteps. Furthermore, we introduce Semantic Relative Preference Optimization (SRPO), in which rewards are formulated as text-conditioned signals. This approach enables online adjustment of rewards in response to positive and negative prompt augmentation, thereby reducing the reliance on offline reward fine-tuning. By fine-tuning the FLUX.1.dev model with optimized denoising and online reward adjustment, we improve its human-evaluated realism and aesthetic quality by over 3x.