V-Bridge: Bridging Video Generative Priors to Versatile Few-shot Image Restoration

Shenghe Zheng, Junpeng Jiang, Wenbo Li

2026-03-16

V-Bridge: Bridging Video Generative Priors to Versatile Few-shot Image Restoration

Summary

This paper explores how powerful video-generating AI models, already good at creating realistic videos, can also be surprisingly effective at fixing damaged images with very little training data.

What's the problem?

Current image restoration techniques, which aim to fix blurry, noisy, or otherwise degraded images, usually require massive datasets and often need a separate model trained for each specific type of damage. This is inefficient and doesn't leverage the fact that these models likely already 'understand' how images *should* look based on their video training.

What's the solution?

The researchers developed a method called V-Bridge that reframes image restoration as a process of gradually 'generating' a clean image, similar to how a video model creates frames. They took existing video models and showed that, with only a small amount of example data – just 1,000 images with corresponding damaged versions – these models could restore images almost as well as models specifically designed for the task, and even handle multiple types of image damage with a single model.

Why it matters?

This work suggests that video models have a hidden ability to understand and fix visual problems, and that we don't necessarily need huge, specialized datasets for every image restoration task. It opens up the possibility of using these large video models as a general foundation for many different visual tasks, blurring the lines between creating images and understanding them, and potentially leading to more efficient and versatile AI systems.

Abstract

Large-scale video generative models are trained on vast and diverse visual data, enabling them to internalize rich structural, semantic, and dynamic priors of the visual world. While these models have demonstrated impressive generative capability, their potential as general-purpose visual learners remains largely untapped. In this work, we introduce V-Bridge, a framework that bridges this latent capacity to versatile few-shot image restoration tasks. We reinterpret image restoration not as a static regression problem, but as a progressive generative process, and leverage video models to simulate the gradual refinement from degraded inputs to high-fidelity outputs. Surprisingly, with only 1,000 multi-task training samples (less than 2% of existing restoration methods), pretrained video models can be induced to perform competitive image restoration, achieving multiple tasks with a single model, rivaling specialized architectures designed explicitly for this purpose. Our findings reveal that video generative models implicitly learn powerful and transferable restoration priors that can be activated with only extremely limited data, challenging the traditional boundary between generative modeling and low-level vision, and opening a new design paradigm for foundation models in visual tasks.

View Paper