UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback

Ropeway Liu, Hangjie Yuan, Bo Dong, Jiazheng Xing, Jinwang Wang, Rui Zhao, Yan Xing, Weihua Chen, Fan Wang

2025-11-04

UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback

Summary

This paper introduces a new system called UniLumos for realistically changing the lighting in images and videos. It focuses on making the lighting look physically correct, meaning shadows and highlights behave as they would in the real world.

What's the problem?

Current methods for relighting, which use a technique called diffusion models, often create unrealistic results. They work in a way that doesn't fully understand how light interacts with objects, leading to issues like overly bright spots, shadows in the wrong places, and objects appearing to be lit from impossible angles. These models are also slow because they require many steps to refine the image.

What's the solution?

UniLumos solves this by incorporating information about the 3D structure of the scene – specifically depth and surface normals – directly into the relighting process. This helps the model understand where objects are and how light should bounce off them. To speed things up, they also developed a new training technique called 'path consistency learning' that allows the model to learn effectively with fewer steps. Finally, they created a detailed way to describe lighting conditions and a benchmark to test how well different relighting methods perform on specific lighting attributes.

Why it matters?

This work is important because it significantly improves the realism and speed of relighting technology. Better relighting has practical applications in areas like visual effects for movies and video editing, and it also allows for more artistic control over images and videos. The new benchmark they created will also help researchers compare and improve relighting algorithms in the future.

Abstract

Relighting is a crucial task with both practical demand and artistic value, and recent diffusion models have shown strong potential by enabling rich and controllable lighting effects. However, as they are typically optimized in semantic latent space, where proximity does not guarantee physical correctness in visual space, they often produce unrealistic results, such as overexposed highlights, misaligned shadows, and incorrect occlusions. We address this with UniLumos, a unified relighting framework for both images and videos that brings RGB-space geometry feedback into a flow matching backbone. By supervising the model with depth and normal maps extracted from its outputs, we explicitly align lighting effects with the scene structure, enhancing physical plausibility. Nevertheless, this feedback requires high-quality outputs for supervision in visual space, making standard multi-step denoising computationally expensive. To mitigate this, we employ path consistency learning, allowing supervision to remain effective even under few-step training regimes. To enable fine-grained relighting control and supervision, we design a structured six-dimensional annotation protocol capturing core illumination attributes. Building upon this, we propose LumosBench, a disentangled attribute-level benchmark that evaluates lighting controllability via large vision-language models, enabling automatic and interpretable assessment of relighting precision across individual dimensions. Extensive experiments demonstrate that UniLumos achieves state-of-the-art relighting quality with significantly improved physical consistency, while delivering a 20x speedup for both image and video relighting. Code is available at https://github.com/alibaba-damo-academy/Lumos-Custom.

View Paper