FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models

Vladimir Kulikov, Matan Kleiner, Inbar Huberman-Spiegelglas, Tomer Michaeli

2024-12-12

FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models

Summary

This paper discusses FlowEdit, a new method for editing images using text descriptions without needing to reverse the image into noise or optimize the process. It provides a simpler and more effective way to achieve high-quality image edits.

What's the problem?

The main issue with traditional image editing methods using text-to-image models is that they require a complicated process of converting images into noise before making any changes. This inversion process often doesn't yield good results and varies between different AI models, making it hard to use the same method across different platforms.

What's the solution?

FlowEdit solves this problem by eliminating the need for inversion and optimization. Instead, it uses a mathematical approach called an Ordinary Differential Equation (ODE) to create a direct link between the original and desired images based on their text prompts. This method is more efficient and produces better results compared to traditional techniques.

Why it matters?

This innovation is important because it simplifies image editing, making it faster and more accessible for users. By allowing edits across various models without needing complex adjustments, FlowEdit opens up new possibilities for creative expression in digital art.

Abstract

Editing real images using a pre-trained text-to-image (T2I) diffusion/flow model often involves inverting the image into its corresponding noise map. However, inversion by itself is typically insufficient for obtaining satisfactory results, and therefore many methods additionally intervene in the sampling process. Such methods achieve improved results but are not seamlessly transferable between model architectures. Here, we introduce FlowEdit, a text-based editing method for pre-trained T2I flow models, which is inversion-free, optimization-free and model agnostic. Our method constructs an ODE that directly maps between the source and target distributions (corresponding to the source and target text prompts) and achieves a lower transport cost than the inversion approach. This leads to state-of-the-art results, as we illustrate with Stable Diffusion 3 and FLUX. Code and examples are available on the project's webpage.

View Paper