NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation
Yu Zeng, Charles Ochoa, Mingyuan Zhou, Vishal M. Patel, Vitor Guizilini, Rowan McAllister
2025-12-05
Summary
This paper introduces a new way to improve diffusion models, which are used for creating images and videos, by focusing on preserving the structural information within the data during the noise addition process.
What's the problem?
Traditional diffusion models add noise to images by randomly changing both the brightness and the spatial arrangement of details, essentially scrambling the image's structure. While this works for general image creation, it's a problem when you need the generated output to have a consistent shape or structure, like when trying to realistically re-render a scene or make a simulated image look like a real photo. Destroying the spatial arrangement makes it hard to maintain geometric consistency.
What's the solution?
The researchers developed a technique called Phase-Preserving Diffusion (φ-PD) that only randomizes the brightness of details while keeping the spatial arrangement, or 'phase,' intact. They also created a special type of noise that lets you control how much structure is preserved. This method works with existing diffusion models without needing any changes to their design or adding extra complexity, and doesn't slow down the image creation process.
Why it matters?
This new approach is important because it allows diffusion models to create images and videos that are more structurally sound and controllable. This is particularly useful for applications like improving the realism of simulations, making self-driving car systems work better by bridging the gap between simulated and real-world environments, and generally improving image-to-image and video-to-video translation tasks where maintaining spatial relationships is crucial.
Abstract
Standard diffusion corrupts data using Gaussian noise whose Fourier coefficients have random magnitudes and random phases. While effective for unconditional or text-to-image generation, corrupting phase components destroys spatial structure, making it ill-suited for tasks requiring geometric consistency, such as re-rendering, simulation enhancement, and image-to-image translation. We introduce Phase-Preserving Diffusion φ-PD, a model-agnostic reformulation of the diffusion process that preserves input phase while randomizing magnitude, enabling structure-aligned generation without architectural changes or additional parameters. We further propose Frequency-Selective Structured (FSS) noise, which provides continuous control over structural rigidity via a single frequency-cutoff parameter. φ-PD adds no inference-time cost and is compatible with any diffusion model for images or videos. Across photorealistic and stylized re-rendering, as well as sim-to-real enhancement for driving planners, φ-PD produces controllable, spatially aligned results. When applied to the CARLA simulator, φ-PD improves CARLA-to-Waymo planner performance by 50\%. The method is complementary to existing conditioning approaches and broadly applicable to image-to-image and video-to-video generation. Videos, additional examples, and code are available on our https://yuzeng-at-tri.github.io/ppd-page/{project page}.