UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
Haozhe Zhao, Xiaojian Ma, Liang Chen, Shuzheng Si, Rujie Wu, Kaikai An, Peiyu Yu, Minjia Zhang, Qing Li, Baobao Chang
2024-07-09

Summary
This paper talks about UltraEdit, a new dataset created for instruction-based image editing that contains around 4 million examples. It aims to improve the quality and variety of image editing tasks by addressing the weaknesses of previous datasets.
What's the problem?
The main problem is that existing image editing datasets, like InstructPix2Pix and MagicBrush, have limitations such as a lack of diverse editing instructions and reliance on artificial images generated by models. This can lead to biases and less effective training for image editing models. Additionally, many datasets do not provide detailed information about specific areas of the images that need editing.
What's the solution?
To solve these issues, the authors developed UltraEdit, which offers a large-scale dataset with a wide range of editing instructions. They used real images, including photographs and artworks, to ensure diversity and reduce bias. UltraEdit also includes region-based editing capabilities, allowing for more precise edits in specific areas of an image. The dataset was created using large language models (LLMs) to generate instructions and human raters to provide context and examples. The results showed that models trained on UltraEdit performed significantly better on established benchmarks compared to previous datasets.
Why it matters?
This research is important because it enhances the tools available for image editing by providing a more comprehensive and high-quality dataset. By improving how models learn to edit images, UltraEdit can lead to better performance in various applications like graphic design, advertising, and digital art, ultimately making it easier for creators to produce high-quality visual content.
Abstract
This paper presents UltraEdit, a large-scale (approximately 4 million editing samples), automatically generated dataset for instruction-based image editing. Our key idea is to address the drawbacks in existing image editing datasets like InstructPix2Pix and MagicBrush, and provide a systematic approach to producing massive and high-quality image editing samples. UltraEdit offers several distinct advantages: 1) It features a broader range of editing instructions by leveraging the creativity of large language models (LLMs) alongside in-context editing examples from human raters; 2) Its data sources are based on real images, including photographs and artworks, which provide greater diversity and reduced bias compared to datasets solely generated by text-to-image models; 3) It also supports region-based editing, enhanced by high-quality, automatically produced region annotations. Our experiments show that canonical diffusion-based editing baselines trained on UltraEdit set new records on MagicBrush and Emu-Edit benchmarks. Our analysis further confirms the crucial role of real image anchors and region-based editing data. The dataset, code, and models can be found in https://ultra-editing.github.io.