DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
Yuang Ai, Xiaoqiang Zhou, Huaibo Huang, Xiaotian Han, Zhengyu Chen, Quanzeng You, Hongxia Yang
2024-10-29

Summary
This paper introduces DreamClear, a new image restoration model designed to improve the quality of images by fixing issues like blurriness or noise, using a large and privacy-safe dataset.
What's the problem?
Restoring images in real-world scenarios is challenging because existing models often lack the capacity to handle complex images, and there aren't enough high-quality datasets available for training. Many datasets are small and not diverse enough, which limits how well models can perform in different situations.
What's the solution?
The authors propose a two-part solution: first, they developed GenIR, a data curation pipeline that creates a large dataset of one million high-quality images while ensuring privacy and copyright compliance. This process includes constructing image-text pairs and filtering the data to maintain quality. Second, they introduced DreamClear, an advanced model that uses a technique called Diffusion Transformer to restore images effectively. It incorporates various experts to adapt to different types of image problems, allowing it to handle a wide range of real-world scenarios.
Why it matters?
This research is significant because it provides a powerful tool for improving image quality in various applications, such as photography, medical imaging, and video production. By using a large and diverse dataset while ensuring privacy, DreamClear can help create clearer and more realistic images, making it valuable for both professionals and everyday users.
Abstract
Image restoration (IR) in real-world scenarios presents significant challenges due to the lack of high-capacity models and comprehensive datasets. To tackle these issues, we present a dual strategy: GenIR, an innovative data curation pipeline, and DreamClear, a cutting-edge Diffusion Transformer (DiT)-based image restoration model. GenIR, our pioneering contribution, is a dual-prompt learning pipeline that overcomes the limitations of existing datasets, which typically comprise only a few thousand images and thus offer limited generalizability for larger models. GenIR streamlines the process into three stages: image-text pair construction, dual-prompt based fine-tuning, and data generation & filtering. This approach circumvents the laborious data crawling process, ensuring copyright compliance and providing a cost-effective, privacy-safe solution for IR dataset construction. The result is a large-scale dataset of one million high-quality images. Our second contribution, DreamClear, is a DiT-based image restoration model. It utilizes the generative priors of text-to-image (T2I) diffusion models and the robust perceptual capabilities of multi-modal large language models (MLLMs) to achieve photorealistic restoration. To boost the model's adaptability to diverse real-world degradations, we introduce the Mixture of Adaptive Modulator (MoAM). It employs token-wise degradation priors to dynamically integrate various restoration experts, thereby expanding the range of degradations the model can address. Our exhaustive experiments confirm DreamClear's superior performance, underlining the efficacy of our dual strategy for real-world image restoration. Code and pre-trained models will be available at: https://github.com/shallowdream204/DreamClear.