Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator

Kaiwen Zheng, Yongxin Chen, Huayu Chen, Guande He, Ming-Yu Liu, Jun Zhu, Qinsheng Zhang

2025-03-04

Direct Discriminative Optimization: Your Likelihood-Based Visual
Generative Model is Secretly a GAN Discriminator

Summary

This paper talks about a new way to improve AI models that generate images, called Direct Discriminative Optimization (DDO). It shows that these models can work better by thinking of them as judges of real vs fake images, similar to how some other AI systems work.

What's the problem?

Current AI models that create images are good, but they have a limit on how realistic they can make their images. This is because of the way they're trained, which makes them try to cover all possible types of images instead of focusing on the most realistic ones.

What's the solution?

The researchers created DDO, which changes how these AI models are trained. Instead of just trying to make all kinds of images, DDO teaches the AI to compare its images to real ones. This is done by having the AI look at the difference between what it thinks is likely for a real image versus a fake one. DDO can be used to improve AI models that are already trained, making them better without having to start from scratch.

Why it matters?

This matters because it helps AI generate more realistic images without needing more powerful computers. The researchers showed that their method made significant improvements to existing top-performing AI models, setting new records for image quality on standard tests. This could lead to better AI-generated images for things like art, design, and even helping to create more realistic virtual worlds.

Abstract

While likelihood-based generative models, particularly diffusion and autoregressive models, have achieved remarkable fidelity in visual generation, the maximum likelihood estimation (MLE) objective inherently suffers from a mode-covering tendency that limits the generation quality under limited model capacity. In this work, we propose Direct Discriminative Optimization (DDO) as a unified framework that bridges likelihood-based generative training and the GAN objective to bypass this fundamental constraint. Our key insight is to parameterize a discriminator implicitly using the likelihood ratio between a learnable target model and a fixed reference model, drawing parallels with the philosophy of Direct Preference Optimization (DPO). Unlike GANs, this parameterization eliminates the need for joint training of generator and discriminator networks, allowing for direct, efficient, and effective finetuning of a well-trained model to its full potential beyond the limits of MLE. DDO can be performed iteratively in a self-play manner for progressive model refinement, with each round requiring less than 1% of pretraining epochs. Our experiments demonstrate the effectiveness of DDO by significantly advancing the previous SOTA diffusion model EDM, reducing FID scores from 1.79/1.58 to new records of 1.30/0.97 on CIFAR-10/ImageNet-64 datasets, and by consistently improving both guidance-free and CFG-enhanced FIDs of visual autoregressive models on ImageNet 256times256.

View Paper