DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models

Chang-Han Yeh, Chin-Yang Lin, Zhixiang Wang, Chi-Wei Hsiao, Ting-Hsuan Chen, Yu-Lun Liu

2024-07-02

DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models

Summary

This paper talks about DiffIR2VR-Zero, a new method for restoring videos without needing to retrain the model for each specific task. It uses pre-trained image restoration models to enhance video quality, making it easier and faster to improve videos.

What's the problem?

Traditional methods for restoring videos often require retraining the model every time you want to work with different types of video problems, like noise removal or improving resolution. This can be time-consuming and inefficient, especially when dealing with various video formats and qualities. Additionally, many existing methods struggle to perform well across different types of video degradation, which limits their usefulness.

What's the solution?

To solve this issue, the authors developed DiffIR2VR-Zero, which uses a technique called zero-shot learning. This means that the method can restore videos without needing any additional training on specific tasks. They achieve this by using a hierarchical token merging strategy that processes keyframes (important frames in a video) and local frames (the surrounding frames). The method combines optical flow (which tracks motion between frames) with feature-based matching to ensure that the restored video looks good and maintains quality. This approach allows the model to generalize better across different datasets and types of video degradation.

Why it matters?

This research is important because it provides a more efficient way to restore videos without the need for extensive retraining, making it accessible for various applications in media production, streaming services, and other fields that require high-quality video output. By improving video restoration techniques, DiffIR2VR-Zero can help enhance the viewing experience in movies, games, and online content.

Abstract

This paper introduces a method for zero-shot video restoration using pre-trained image restoration diffusion models. Traditional video restoration methods often need retraining for different settings and struggle with limited generalization across various degradation types and datasets. Our approach uses a hierarchical token merging strategy for keyframes and local frames, combined with a hybrid correspondence mechanism that blends optical flow and feature-based nearest neighbor matching (latent merging). We show that our method not only achieves top performance in zero-shot video restoration but also significantly surpasses trained models in generalization across diverse datasets and extreme degradations (8times super-resolution and high-standard deviation video denoising). We present evidence through quantitative metrics and visual comparisons on various challenging datasets. Additionally, our technique works with any 2D restoration diffusion model, offering a versatile and powerful tool for video enhancement tasks without extensive retraining. This research leads to more efficient and widely applicable video restoration technologies, supporting advancements in fields that require high-quality video output. See our project page for video results at https://jimmycv07.github.io/DiffIR2VR_web/.

View Paper