Test-time scaling of diffusions with flow maps

Amirmojtaba Sabour, Michael S. Albergo, Carles Domingo-Enrich, Nicholas M. Boffi, Sanja Fidler, Karsten Kreis, Eric Vanden-Eijnden

2025-12-01

Test-time scaling of diffusions with flow maps

Summary

This paper introduces a new method, called Flow Map Trajectory Tilting (FMTT), to improve how diffusion models generate images based on what a user wants. It's about guiding the image creation process to get results that score highly according to a specific 'reward' or preference.

What's the problem?

When you try to steer a diffusion model with a user-defined reward, there's a tricky issue. The reward function usually only makes sense for *finished* images, but the diffusion model is creating the image step-by-step. So, how do you tell the model what the reward will be *before* the image is complete? Existing methods try to guess what the final image would look like, but that's often inaccurate and doesn't work well.

What's the solution?

The researchers realized they could use something called a 'flow map' which describes how the image is being built during the diffusion process. By understanding how the image changes at each step, they can directly adjust the image creation to increase the reward. Their FMTT method uses this flow map to make better adjustments than simply using the reward's gradient, ensuring the image moves towards higher reward values more effectively. It can either create images directly or search for the best images based on the reward.

Why it matters?

This new approach is important because it allows diffusion models to respond to complex user preferences more accurately. It opens the door to more sophisticated image editing, like using instructions from a vision-language model to change an image in specific ways. Essentially, it makes it easier to get the AI to create exactly what you want, even with complicated requests.

Abstract

A common recipe to improve diffusion models at test-time so that samples score highly against a user-specified reward is to introduce the gradient of the reward into the dynamics of the diffusion itself. This procedure is often ill posed, as user-specified rewards are usually only well defined on the data distribution at the end of generation. While common workarounds to this problem are to use a denoiser to estimate what a sample would have been at the end of generation, we propose a simple solution to this problem by working directly with a flow map. By exploiting a relationship between the flow map and velocity field governing the instantaneous transport, we construct an algorithm, Flow Map Trajectory Tilting (FMTT), which provably performs better ascent on the reward than standard test-time methods involving the gradient of the reward. The approach can be used to either perform exact sampling via importance weighting or principled search that identifies local maximizers of the reward-tilted distribution. We demonstrate the efficacy of our approach against other look-ahead techniques, and show how the flow map enables engagement with complicated reward functions that make possible new forms of image editing, e.g. by interfacing with vision language models.

View Paper