Learning to Refocus with Video Diffusion Models

SaiKiran Tedla, Zhoutong Zhang, Xuaner Zhang, Shumian Xin

2025-12-24

Learning to Refocus with Video Diffusion Models

Summary

This paper presents a new technique for making photos look like they were focused on different parts *after* you've already taken the picture, using artificial intelligence.

What's the problem?

Autofocus on cameras isn't always perfect, and sometimes you realize after taking a photo that you wanted something else to be in focus. Existing methods for fixing this after the fact don't usually look very realistic or work well in tricky situations like low light or with moving objects.

What's the solution?

The researchers developed a system that uses a type of AI called a 'video diffusion model'. Basically, it takes a blurry photo and creates a series of images, each with a slightly different focus point, almost like a short video showing different focus options. This allows you to then choose where you want the focus to be, and it looks much more natural than previous methods.

Why it matters?

This work is important because it could significantly improve the editing capabilities of smartphone cameras and other photography tools. It means you won't be as frustrated by slightly out-of-focus pictures, and it opens up possibilities for new creative effects and image manipulation techniques. They also shared the data they used to train their AI, which will help other researchers improve this technology further.

Abstract

Focus is a cornerstone of photography, yet autofocus systems often fail to capture the intended subject, and users frequently wish to adjust focus after capture. We introduce a novel method for realistic post-capture refocusing using video diffusion models. From a single defocused image, our approach generates a perceptually accurate focal stack, represented as a video sequence, enabling interactive refocusing and unlocking a range of downstream applications. We release a large-scale focal stack dataset acquired under diverse real-world smartphone conditions to support this work and future research. Our method consistently outperforms existing approaches in both perceptual quality and robustness across challenging scenarios, paving the way for more advanced focus-editing capabilities in everyday photography. Code and data are available at www.learn2refocus.github.io

View Paper