FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers

Yusuf Dalva, Kavana Venkatesh, Pinar Yanardag

2024-12-16

FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers

Summary

This paper talks about FluxSpace, a new method for editing images that allows for precise changes to specific parts of an image without affecting other areas.

What's the problem?

Current image editing techniques often struggle to make specific changes without altering unrelated parts of the image. For example, if you want to change someone's hair color, existing methods might also change their facial features or the background, leading to unrealistic results.

What's the solution?

FluxSpace introduces a way to edit images by using a special representation space that separates different aspects of the image. This allows users to make targeted edits, such as changing the color of an object or adding details, while keeping everything else intact. The method leverages the capabilities of Rectified Flow Transformers to achieve these precise edits efficiently and effectively.

Why it matters?

This research is important because it enhances the tools available for image editing, making it easier for artists and designers to create high-quality visuals. By allowing for more control over specific elements in an image, FluxSpace can be used in various applications such as graphic design, advertising, and even in creating art, leading to more creative possibilities.

Abstract

Rectified flow models have emerged as a dominant approach in image generation, showcasing impressive capabilities in high-quality image synthesis. However, despite their effectiveness in visual generation, rectified flow models often struggle with disentangled editing of images. This limitation prevents the ability to perform precise, attribute-specific modifications without affecting unrelated aspects of the image. In this paper, we introduce FluxSpace, a domain-agnostic image editing method leveraging a representation space with the ability to control the semantics of images generated by rectified flow transformers, such as Flux. By leveraging the representations learned by the transformer blocks within the rectified flow models, we propose a set of semantically interpretable representations that enable a wide range of image editing tasks, from fine-grained image editing to artistic creation. This work offers a scalable and effective image editing approach, along with its disentanglement capabilities.

View Paper