PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data
Shijie Huang, Yiren Song, Yuxuan Zhang, Hailong Guo, Xueyin Wang, Mike Zheng Shou, Jiaming Liu
2025-02-24
Summary
This paper talks about PhotoDoodle, a new AI tool that helps artists easily add decorative elements to photos while keeping the original image intact and blending everything seamlessly.
What's the problem?
Editing photos to add artistic elements is difficult because the new additions need to look natural and match the photo's perspective and style. Current methods either change the entire image's style or require very precise manual work, which makes them impractical for creating customized edits.
What's the solution?
The researchers developed PhotoDoodle, which uses a two-step training process. First, they trained a general-purpose image editing model called OmniEditor on a large dataset. Then, they fine-tuned it with a smaller, artist-curated dataset using a method called EditLoRA to learn specific artistic styles. They also introduced a technique called positional encoding cloning to ensure that added elements align perfectly with the original photo.
Why it matters?
This matters because it makes artistic photo editing much easier and more accessible. PhotoDoodle allows artists to create detailed and personalized edits without distorting the original image. It opens up new possibilities for creative projects and sets a new standard for high-quality, efficient image editing.
Abstract
We introduce PhotoDoodle, a novel image editing framework designed to facilitate photo doodling by enabling artists to overlay decorative elements onto photographs. Photo doodling is challenging because the inserted elements must appear seamlessly integrated with the background, requiring realistic blending, perspective alignment, and contextual coherence. Additionally, the background must be preserved without distortion, and the artist's unique style must be captured efficiently from limited training data. These requirements are not addressed by previous methods that primarily focus on global style transfer or regional inpainting. The proposed method, PhotoDoodle, employs a two-stage training strategy. Initially, we train a general-purpose image editing model, OmniEditor, using large-scale data. Subsequently, we fine-tune this model with EditLoRA using a small, artist-curated dataset of before-and-after image pairs to capture distinct editing styles and techniques. To enhance consistency in the generated results, we introduce a positional encoding reuse mechanism. Additionally, we release a PhotoDoodle dataset featuring six high-quality styles. Extensive experiments demonstrate the advanced performance and robustness of our method in customized image editing, opening new possibilities for artistic creation.