ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation

Rotem Shalev-Arkushin, Rinon Gal, Amit H. Bermano, Ohad Fried

2025-02-17

ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation

Summary

This paper talks about ImageRAG, a new method for creating AI-generated images that uses existing pictures as references to help make better and more accurate images, especially for rare or unusual subjects.

What's the problem?

Current AI image generators are good at making high-quality and varied images, but they have trouble when asked to create pictures of things they haven't seen much before or that are uncommon. This means they might mess up details or make images that don't quite match what people are asking for when it comes to rare or specific things.

What's the solution?

The researchers created ImageRAG, which works like a smart assistant for AI image generators. When someone asks for an image, ImageRAG first looks through a big collection of real pictures to find ones that are similar to what's being asked for. It then uses these pictures as guides to help the AI make a new image that's more accurate and detailed. Unlike other methods that need special training, ImageRAG can work with existing AI image models, making it easier to use and more flexible.

Why it matters?

This matters because it could make AI-generated images much more accurate and useful, especially for things that aren't common. It could help artists, designers, and anyone who needs to create specific images quickly. For example, it could be great for making illustrations of rare animals, unique architectural designs, or even imaginary scenes that combine real-world elements in new ways. By making AI better at understanding and creating a wider range of images, ImageRAG could open up new possibilities in fields like entertainment, education, and visual communication.

Abstract

Diffusion models enable high-quality and diverse visual content synthesis. However, they struggle to generate rare or unseen concepts. To address this challenge, we explore the usage of Retrieval-Augmented Generation (RAG) with image generation models. We propose ImageRAG, a method that dynamically retrieves relevant images based on a given text prompt, and uses them as context to guide the generation process. Prior approaches that used retrieved images to improve generation, trained models specifically for retrieval-based generation. In contrast, ImageRAG leverages the capabilities of existing image conditioning models, and does not require RAG-specific training. Our approach is highly adaptable and can be applied across different model types, showing significant improvement in generating rare and fine-grained concepts using different base models. Our project page is available at: https://rotem-shalev.github.io/ImageRAG

View Paper