Magic Insert: Style-Aware Drag-and-Drop

Nataniel Ruiz, Yuanzhen Li, Neal Wadhwa, Yael Pritch, Michael Rubinstein, David E. Jacobs, Shlomi Fruchter

2024-07-03

Summary

This paper talks about a new technique called Magic Insert that allows users to drag and drop subjects from one image into another image that has a different artistic style, while making sure the inserted subject looks realistic and fits well with the new background.

What's the problem?

The main problem is that when trying to insert an object from one image into another, especially when the two images have different styles (like a cartoon and a photograph), it can be difficult to make the insertion look natural. Traditional methods often result in unrealistic or poorly integrated images.

What's the solution?

To solve this, the authors developed Magic Insert, which focuses on two key areas: style-aware personalization and realistic object insertion. First, they fine-tune a pre-trained model to understand the subject better and match it with the style of the target image. Then, they use a technique called Bootstrapped Domain Adaptation to help the model adapt to different artistic styles. This method is shown to work much better than older techniques like inpainting, which often produced unsatisfactory results.

Why it matters?

This research is important because it enhances how we can manipulate images creatively, allowing for more flexibility in art and design. By improving the ability to insert subjects into various artistic styles seamlessly, it opens up new possibilities for artists, designers, and anyone working with visual media.

Abstract

We present Magic Insert, a method for dragging-and-dropping subjects from a user-provided image into a target image of a different style in a physically plausible manner while matching the style of the target image. This work formalizes the problem of style-aware drag-and-drop and presents a method for tackling it by addressing two sub-problems: style-aware personalization and realistic object insertion in stylized images. For style-aware personalization, our method first fine-tunes a pretrained text-to-image diffusion model using LoRA and learned text tokens on the subject image, and then infuses it with a CLIP representation of the target style. For object insertion, we use Bootstrapped Domain Adaption to adapt a domain-specific photorealistic object insertion model to the domain of diverse artistic styles. Overall, the method significantly outperforms traditional approaches such as inpainting. Finally, we present a dataset, SubjectPlop, to facilitate evaluation and future progress in this area. Project page: https://magicinsert.github.io/

View Paper