Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment

Yang Chen, Xiaowei Xu, Shuai Wang, Chenhui Zhu, Ruxue Wen, Xubin Li, Tiezheng Ge, Limin Wang

2025-12-04

Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment

Summary

This paper introduces a new way to train Normalizing Flows, which are a type of machine learning model used to generate new data that looks similar to the data they were trained on, like creating realistic images.

What's the problem?

Normalizing Flows are good at both understanding data and creating new data, but traditionally they haven't been great at creating *high-quality* new data. This is because the way they're usually trained doesn't focus enough on making sure the generated data actually makes sense visually or semantically – it just focuses on matching the overall statistical properties of the training data.

What's the solution?

The researchers came up with a clever trick. Instead of trying to improve how the model transforms data *into* its internal representation, they focused on improving how it transforms data *back* out to create new samples. They did this by making sure the intermediate steps of the reverse transformation align with what a powerful, already well-trained image understanding model 'thinks' is important. They also created a new way to test if the model truly understands what it's generating, without needing extra training.

Why it matters?

This new approach makes training Normalizing Flows much faster – over three times faster in their experiments – and significantly improves the quality of the images they generate. It also makes them better at tasks like image classification, showing they've learned a more meaningful understanding of the data. This work sets a new standard for how well Normalizing Flows can perform on complex image datasets like ImageNet.

Abstract

Normalizing Flows (NFs) are a class of generative models distinguished by a mathematically invertible architecture, where the forward pass transforms data into a latent space for density estimation, and the reverse pass generates new samples from this space. This characteristic creates an intrinsic synergy between representation learning and data generation. However, the generative quality of standard NFs is limited by poor semantic representations from log-likelihood optimization. To remedy this, we propose a novel alignment strategy that creatively leverages the invertibility of NFs: instead of regularizing the forward pass, we align the intermediate features of the generative (reverse) pass with representations from a powerful vision foundation model, demonstrating superior effectiveness over naive alignment. We also introduce a novel training-free, test-time optimization algorithm for classification, which provides a more intrinsic evaluation of the NF's embedded semantic knowledge. Comprehensive experiments demonstrate that our approach accelerates the training of NFs by over 3.3times, while simultaneously delivering significant improvements in both generative quality and classification accuracy. New state-of-the-art results for NFs are established on ImageNet 64times64 and 256times256. Our code is available at https://github.com/MCG-NJU/FlowBack.

View Paper