Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

Litu Rout, Yujia Chen, Nataniel Ruiz, Constantine Caramanis, Sanjay Shakkottai, Wen-Sheng Chu

2024-10-15

Summary

This paper discusses a new method called Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations, which allows for better transformation and editing of images by converting them back into a structured form.

What's the problem?

Generative models are used to create images from random noise, but when trying to reverse this process (called inversion) to get back to the noise for editing, existing methods face challenges. These challenges include difficulties in maintaining the quality of the image and making edits due to complex behaviors in how the models work. Current methods can be expensive and require a lot of extra training.

What's the solution?

The authors propose using Rectified Flows (RFs) instead of traditional diffusion models for inversion. They introduce a new approach that uses dynamic optimal control to guide the inversion process, making it more efficient. This method allows for high-quality image recovery and editing without needing extensive retraining. Additionally, they demonstrate that their technique performs well in tasks like turning sketches into detailed images and editing existing images based on new prompts.

Why it matters?

This research is important because it improves how we can edit and recover images using AI. By providing a more effective method for image inversion and editing, it opens up new possibilities for applications in art, design, and any field that relies on visual content creation.

Abstract

Generative models transform random noise into images; their inversion aims to transform images back to structured noise for recovery and editing. This paper addresses two key tasks: (i) inversion and (ii) editing of a real image using stochastic equivalents of rectified flow models (such as Flux). Although Diffusion Models (DMs) have recently dominated the field of generative modeling for images, their inversion presents faithfulness and editability challenges due to nonlinearities in drift and diffusion. Existing state-of-the-art DM inversion approaches rely on training of additional parameters or test-time optimization of latent variables; both are expensive in practice. Rectified Flows (RFs) offer a promising alternative to diffusion models, yet their inversion has been underexplored. We propose RF inversion using dynamic optimal control derived via a linear quadratic regulator. We prove that the resulting vector field is equivalent to a rectified stochastic differential equation. Additionally, we extend our framework to design a stochastic sampler for Flux. Our inversion method allows for state-of-the-art performance in zero-shot inversion and editing, outperforming prior works in stroke-to-image synthesis and semantic image editing, with large-scale human evaluations confirming user preference.

View Paper