Light-X: Generative 4D Video Rendering with Camera and Illumination Control

Tianqi Liu, Zhaoxi Chen, Zihao Huang, Shaocong Xu, Saining Zhang, Chongjie Ye, Bohan Li, Zhiguo Cao, Wei Li, Hao Zhao, Ziwei Liu

2025-12-04

Light-X: Generative 4D Video Rendering with Camera and Illumination Control

Summary

This paper introduces Light-X, a new system for creating realistic videos from a single video source where you can control both the camera's movement and the lighting in the scene.

What's the problem?

Existing methods for changing the lighting in videos, or even creating new views of a scene, often struggle to do both well at the same time. They either make the lighting look very realistic but the video seems shaky, or keep the video smooth but the lighting looks artificial. A big challenge is also the lack of training data – it’s hard to find videos taken from multiple angles with different lighting conditions to teach a computer how to do this effectively.

What's the solution?

The researchers tackled this by separating how the scene's shape and movement are represented from how the lighting is represented. They use 'dynamic point clouds' to capture the geometry and motion, and then separately control the lighting by 'relighting' a single frame and projecting it onto the scene. To get around the lack of training data, they created a system called Light-Syn that automatically generates realistic training examples from regular videos, essentially creating the multi-view, multi-illumination data they needed.

Why it matters?

This work is important because it’s a step towards being able to realistically generate entire scenes from just a single video. Being able to control both the camera and the lighting opens up possibilities for creating special effects, virtual reality experiences, and even editing existing videos in a much more powerful and flexible way than currently possible.

Abstract

Recent advances in illumination control extend image-based methods to video, yet still facing a trade-off between lighting fidelity and temporal consistency. Moving beyond relighting, a key step toward generative modeling of real-world scenes is the joint control of camera trajectory and illumination, since visual dynamics are inherently shaped by both geometry and lighting. To this end, we present Light-X, a video generation framework that enables controllable rendering from monocular videos with both viewpoint and illumination control. 1) We propose a disentangled design that decouples geometry and lighting signals: geometry and motion are captured via dynamic point clouds projected along user-defined camera trajectories, while illumination cues are provided by a relit frame consistently projected into the same geometry. These explicit, fine-grained cues enable effective disentanglement and guide high-quality illumination. 2) To address the lack of paired multi-view and multi-illumination videos, we introduce Light-Syn, a degradation-based pipeline with inverse-mapping that synthesizes training pairs from in-the-wild monocular footage. This strategy yields a dataset covering static, dynamic, and AI-generated scenes, ensuring robust training. Extensive experiments show that Light-X outperforms baseline methods in joint camera-illumination control and surpasses prior video relighting methods under both text- and background-conditioned settings.

View Paper