Reflection Removal through Efficient Adaptation of Diffusion Transformers

Daniyar Zakarin, Thiemo Wandel, Anton Obukhov, Dengxin Dai

2025-12-05

Reflection Removal through Efficient Adaptation of Diffusion Transformers

Summary

This paper introduces a new approach to removing reflections from single images, using a powerful type of AI model called a diffusion transformer. Instead of building a special model just for this task, they cleverly adapt an existing, pre-trained model to handle it.

What's the problem?

Removing reflections from photos is tricky because it requires understanding what's *behind* the reflection, which isn't directly visible. Existing methods often need a lot of specific training data, and it's hard to find enough high-quality images of reflections to train these models effectively. The available datasets aren't diverse or realistic enough for good results.

What's the solution?

The researchers tackled this by using a pre-trained diffusion transformer, a type of AI known for generating realistic images. They 'taught' it to remove reflections by showing it images with reflections and telling it what the image should look like without them. Because real-world reflection data is limited, they created their own realistic training data using computer graphics software (Blender) to simulate reflections on glass. They also used a technique called LoRA to efficiently adapt the pre-trained model without changing it too much.

Why it matters?

This work is important because it provides a more effective and scalable way to remove reflections from images. By reusing a pre-trained model and creating synthetic data, they avoid the need for massive, real-world datasets. This means better reflection removal, even for situations the model hasn't specifically seen before, and it opens the door to more advanced image editing and restoration techniques.

Abstract

We introduce a diffusion-transformer (DiT) framework for single-image reflection removal that leverages the generalization strengths of foundation diffusion models in the restoration setting. Rather than relying on task-specific architectures, we repurpose a pre-trained DiT-based foundation model by conditioning it on reflection-contaminated inputs and guiding it toward clean transmission layers. We systematically analyze existing reflection removal data sources for diversity, scalability, and photorealism. To address the shortage of suitable data, we construct a physically based rendering (PBR) pipeline in Blender, built around the Principled BSDF, to synthesize realistic glass materials and reflection effects. Efficient LoRA-based adaptation of the foundation model, combined with the proposed synthetic data, achieves state-of-the-art performance on in-domain and zero-shot benchmarks. These results demonstrate that pretrained diffusion transformers, when paired with physically grounded data synthesis and efficient adaptation, offer a scalable and high-fidelity solution for reflection removal. Project page: https://hf.co/spaces/huawei-bayerlab/windowseat-reflection-removal-web

View Paper