Gen-Searcher: Reinforcing Agentic Search for Image Generation
Kaituo Feng, Manyuan Zhang, Shuang Chen, Yunlong Lin, Kaixuan Fan, Yilei Jiang, Hongyu Li, Dian Zheng, Chenyang Wang, Xiangyu Yue
2026-03-31
Summary
This paper introduces a new approach to creating images by combining image generation models with the ability to search the internet for information. It addresses the limitation of current image generators that struggle with tasks needing specific knowledge or up-to-date details.
What's the problem?
Existing image generation models are really good at making pictures that *look* real, but they rely on information they were already trained on. This means they often fail when asked to create images about things they don't 'know,' like current events or very specific objects. They lack the ability to look up information and incorporate it into the image creation process, limiting their usefulness in real-world scenarios.
What's the solution?
The researchers developed a system called Gen-Searcher. This system doesn't just generate an image; it first *searches* the internet for relevant text and images. It then uses this found information to create a more accurate and informed image. They also created new datasets and a benchmark called KnowGen to specifically test how well models perform when they need to use external knowledge. Gen-Searcher was trained in two steps: first, learning from examples, and then improving through a reward system that considers both the text and the image quality.
Why it matters?
This work is important because it opens the door to image generation that isn't limited by pre-existing knowledge. By allowing models to search for information, we can create images about almost anything, even things that are new or require specialized understanding. The open-source release of the data, models, and code will help other researchers build upon this work and create even more powerful and versatile image generation systems.
Abstract
Recent image generation models have shown strong capabilities in generating high-fidelity and photorealistic images. However, they are fundamentally constrained by frozen internal knowledge, thus often failing on real-world scenarios that are knowledge-intensive or require up-to-date information. In this paper, we present Gen-Searcher, as the first attempt to train a search-augmented image generation agent, which performs multi-hop reasoning and search to collect the textual knowledge and reference images needed for grounded generation. To achieve this, we construct a tailored data pipeline and curate two high-quality datasets, Gen-Searcher-SFT-10k and Gen-Searcher-RL-6k, containing diverse search-intensive prompts and corresponding ground-truth synthesis images. We further introduce KnowGen, a comprehensive benchmark that explicitly requires search-grounded external knowledge for image generation and evaluates models from multiple dimensions. Based on these resources, we train Gen-Searcher with SFT followed by agentic reinforcement learning with dual reward feedback, which combines text-based and image-based rewards to provide more stable and informative learning signals for GRPO training. Experiments show that Gen-Searcher brings substantial gains, improving Qwen-Image by around 16 points on KnowGen and 15 points on WISE. We hope this work can serve as an open foundation for search agents in image generation, and we fully open-source our data, models, and code.