BrushEdit: All-In-One Image Inpainting and Editing

Yaowei Li, Yuxuan Bian, Xuan Ju, Zhaoyang Zhang, Ying Shan, Qiang Xu

2024-12-17

BrushEdit: All-In-One Image Inpainting and Editing

Summary

This paper presents BrushEdit, a new image editing tool that combines advanced techniques to allow users to easily edit images by giving instructions in natural language. It aims to make image editing more interactive and user-friendly.

What's the problem?

Current image editing methods struggle with significant changes, like adding or removing objects, because they often rely on fixed processes that don't adapt well to complex edits. Traditional approaches can be confusing and limit how users interact with the editing tools, making it hard for them to specify exactly what they want.

What's the solution?

BrushEdit addresses these issues by using a combination of multimodal large language models (MLLMs) and image inpainting techniques. This system allows users to give free-form instructions for editing images, which the tool then interprets to identify what changes are needed. The process involves classifying the type of edit, identifying the main object, creating a mask for the area to be edited, and then performing the actual image changes. This makes the editing process more intuitive and responsive to user needs.

Why it matters?

BrushEdit is important because it enhances how people can interact with image editing software, making it easier and more efficient to achieve desired results. By allowing users to edit images using natural language, it opens up new possibilities for creativity and accessibility in digital art and design.

Abstract

Image editing has advanced significantly with the development of diffusion models using both inversion-based and instruction-based methods. However, current inversion-based approaches struggle with big modifications (e.g., adding or removing objects) due to the structured nature of inversion noise, which hinders substantial changes. Meanwhile, instruction-based methods often constrain users to black-box operations, limiting direct interaction for specifying editing regions and intensity. To address these limitations, we propose BrushEdit, a novel inpainting-based instruction-guided image editing paradigm, which leverages multimodal large language models (MLLMs) and image inpainting models to enable autonomous, user-friendly, and interactive free-form instruction editing. Specifically, we devise a system enabling free-form instruction editing by integrating MLLMs and a dual-branch image inpainting model in an agent-cooperative framework to perform editing category classification, main object identification, mask acquisition, and editing area inpainting. Extensive experiments show that our framework effectively combines MLLMs and inpainting models, achieving superior performance across seven metrics including mask region preservation and editing effect coherence.

View Paper