Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception

Ziqi Pang, Xin Xu, Yu-Xiong Wang

2025-04-16

Aligning Generative Denoising with Discriminative Objectives Unleashes
Diffusion for Visual Perception

Summary

This paper talks about how improving the way diffusion models clean up images during their generation process can make them much better at understanding and analyzing pictures, not just creating them.

What's the problem?

The problem is that while diffusion models are great at generating realistic images, they usually aren't as good at tasks where the AI needs to recognize or analyze what's in an image, like estimating depth or picking out specific objects. This is because the way these models clean up noisy images (called denoising) is mostly focused on making the pictures look good, not on making them easy for the AI to understand.

What's the solution?

The researchers changed the training process for diffusion models so that, while the model is cleaning up the image, it also tries to focus on how well it can recognize and understand different parts of the picture. By aligning the denoising process with what the model needs to do for tasks like depth estimation or image segmentation, the model becomes much better at these jobs.

Why it matters?

This matters because it means one AI model can both create high-quality images and also do a great job understanding them, which is important for things like self-driving cars, robotics, and any technology that needs to both see and make sense of the world.

Abstract

Enhancements to generative diffusion models address gaps in discriminative tasks by focusing on perception quality during denoising, improving performance on depth estimation, referring image segmentation, and generalist perception tasks.

View Paper