Flow Map Distillation Without Data

Shangyuan Tong, Nanye Ma, Saining Xie, Tommi Jaakkola

2025-11-25

Summary

This paper focuses on making generative models, specifically those using 'flow models', faster at creating images. These models are really good at image quality, but they take a long time to generate each image because they need to do many steps. The paper introduces a new way to speed things up without losing quality.

What's the problem?

Currently, to make these flow models faster, researchers 'distill' them into simpler versions called 'flow maps'. However, the standard way to do this requires a lot of example images to train the flow map. The problem is that using a fixed set of example images might not fully represent everything the original, complex model can do. This mismatch between the training data and the original model's capabilities can limit how well the faster flow map performs.

What's the solution?

The researchers came up with a way to create these faster flow maps *without* needing a large dataset of example images. Instead, they have the flow map learn directly from the original model's internal workings, specifically by predicting how the original model creates images from random noise. They also built in a way for the flow map to correct its own mistakes as it learns, ensuring it stays accurate. Essentially, they're teaching the simpler model to mimic the complex model's thought process.

Why it matters?

This research is important because it removes a major bottleneck in using flow models – the need for lots of data to speed them up. By showing that you can distill a fast flow map without any external data, it opens the door to more efficient and reliable image generation. Their method achieves better image quality with fewer steps than previous approaches, setting a new standard for this type of acceleration.

Abstract

State-of-the-art flow models achieve remarkable quality but require slow, iterative sampling. To accelerate this, flow maps can be distilled from pre-trained teachers, a procedure that conventionally requires sampling from an external dataset. We argue that this data-dependency introduces a fundamental risk of Teacher-Data Mismatch, as a static dataset may provide an incomplete or even misaligned representation of the teacher's full generative capabilities. This leads us to question whether this reliance on data is truly necessary for successful flow map distillation. In this work, we explore a data-free alternative that samples only from the prior distribution, a distribution the teacher is guaranteed to follow by construction, thereby circumventing the mismatch risk entirely. To demonstrate the practical viability of this philosophy, we introduce a principled framework that learns to predict the teacher's sampling path while actively correcting for its own compounding errors to ensure high fidelity. Our approach surpasses all data-based counterparts and establishes a new state-of-the-art by a significant margin. Specifically, distilling from SiT-XL/2+REPA, our method reaches an impressive FID of 1.45 on ImageNet 256x256, and 1.49 on ImageNet 512x512, both with only 1 sampling step. We hope our work establishes a more robust paradigm for accelerating generative models and motivates the broader adoption of flow map distillation without data.

View Paper