Is There a Better Source Distribution than Gaussian? Exploring Source Distributions for Image Flow Matching
Junho Lee, Kwanseok Kim, Joonseok Lee
2025-12-23
Summary
This paper investigates how the starting point distribution affects the performance of a new type of generative model called flow matching, which is used to create new data similar to existing data.
What's the problem?
Flow matching usually starts with a simple, well-understood distribution like a Gaussian (normal) distribution. However, it's unclear if this is the *best* starting point, especially when dealing with complex, high-dimensional data. The researchers found that simply trying to make the starting distribution closely match the real data's density can actually make things worse, and that how the model aligns directions and handles different data points impacts how well it learns.
What's the solution?
To understand this better, the researchers created a simplified 2D simulation that mimics the challenges of high-dimensional data. Through this simulation, they discovered that a good strategy is to combine 'norm-aligned training' (making sure the model understands the scale of the data) with 'directionally-pruned sampling' (avoiding starting the generation process in areas where there isn't much data). This pruning step can be added to existing flow matching models without needing to retrain them from scratch.
Why it matters?
This work provides practical advice for designing better flow matching models. It shows that the choice of starting distribution isn't just about matching the data's shape, but also about ensuring stable learning and efficient sampling. The pruning technique offers a simple way to improve the performance of existing models, making them generate higher-quality data faster.
Abstract
Flow matching has emerged as a powerful generative modeling approach with flexible choices of source distribution. While Gaussian distributions are commonly used, the potential for better alternatives in high-dimensional data generation remains largely unexplored. In this paper, we propose a novel 2D simulation that captures high-dimensional geometric properties in an interpretable 2D setting, enabling us to analyze the learning dynamics of flow matching during training. Based on this analysis, we derive several key insights about flow matching behavior: (1) density approximation can paradoxically degrade performance due to mode discrepancy, (2) directional alignment suffers from path entanglement when overly concentrated, (3) Gaussian's omnidirectional coverage ensures robust learning, and (4) norm misalignment incurs substantial learning costs. Building on these insights, we propose a practical framework that combines norm-aligned training with directionally-pruned sampling. This approach maintains the robust omnidirectional supervision essential for stable flow learning, while eliminating initializations in data-sparse regions during inference. Importantly, our pruning strategy can be applied to any flow matching model trained with a Gaussian source, providing immediate performance gains without the need for retraining. Empirical evaluations demonstrate consistent improvements in both generation quality and sampling efficiency. Our findings provide practical insights and guidelines for source distribution design and introduce a readily applicable technique for improving existing flow matching models. Our code is available at https://github.com/kwanseokk/SourceFM.