Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

Lirui Zhao, Tianshuo Yang, Wenqi Shao, Yuxin Zhang, Yu Qiao, Ping Luo, Kaipeng Zhang, Rongrong Ji

2024-07-26

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

Summary

This paper presents Diffree, a new method that allows users to add or change objects in images using only text descriptions. It simplifies the process of image editing by eliminating the need for complex manual adjustments.

What's the problem?

Adding new objects to images while maintaining a realistic look is difficult. Existing methods often require users to outline where the new object should go or provide detailed masks, which can be time-consuming and complicated. Moreover, these methods sometimes fail to keep the background looking consistent with the new object, leading to unrealistic images.

What's the solution?

Diffree uses a technique called a diffusion model to add objects based on simple text instructions. The researchers created a special dataset called OABench, which includes many examples of images with objects removed and descriptions of those objects. By training on this dataset, Diffree learns how to place new objects in the right spot while matching the lighting and texture of the original image. This means users can just describe what they want in words, and Diffree will handle the rest, producing high-quality results without needing extra input from the user.

Why it matters?

This method is significant because it makes image editing more accessible and efficient for everyone, from artists to everyday users. By allowing people to edit images with just text, it opens up new possibilities for creativity and simplifies tasks that previously required technical skills.

Abstract

This paper addresses an important problem of object addition for images with only text guidance. It is challenging because the new object must be integrated seamlessly into the image with consistent visual context, such as lighting, texture, and spatial location. While existing text-guided image inpainting methods can add objects, they either fail to preserve the background consistency or involve cumbersome human intervention in specifying bounding boxes or user-scribbled masks. To tackle this challenge, we introduce Diffree, a Text-to-Image (T2I) model that facilitates text-guided object addition with only text control. To this end, we curate OABench, an exquisite synthetic dataset by removing objects with advanced image inpainting techniques. OABench comprises 74K real-world tuples of an original image, an inpainted image with the object removed, an object mask, and object descriptions. Trained on OABench using the Stable Diffusion model with an additional mask prediction module, Diffree uniquely predicts the position of the new object and achieves object addition with guidance from only text. Extensive experiments demonstrate that Diffree excels in adding new objects with a high success rate while maintaining background consistency, spatial appropriateness, and object relevance and quality.

View Paper