Bokeh Diffusion

The framework integrates three key components to achieve this scene-consistent bokeh control. First, a hybrid dataset pipeline combines real-world images with natural bokeh captured in-the-wild and synthetic blur augmentations, providing robust and diverse training examples that anchor the model’s understanding of realistic defocus. Second, the model employs defocus blur conditioning by injecting a physically interpretable blur parameter into the diffusion process via decoupled cross-attention modules, which modulate the blur intensity without overwriting textual or semantic features. Third, grounded self-attention mechanisms use a pivot image to anchor the scene layout, ensuring consistent object placement and preventing unwanted content shifts as blur levels change. Together, these innovations enable flexible, high-fidelity control over depth-of-field effects in both generated and real image editing scenarios.

Bokeh Diffusion not only advances the quality and controllability of synthetic image generation but also supports practical applications such as real image editing through inversion, allowing users to adjust the bokeh strength of existing photos seamlessly. Its physically grounded approach outperforms post-processing methods that rely on depth estimation, which often struggle to remove or accurately modulate blur. The model’s ability to produce natural bokeh effects even in challenging regions like thin structures or complex backgrounds opens new creative possibilities for photographers, artists, and content creators. By bridging photographic realism with generative modeling, Bokeh Diffusion represents a significant step forward in controllable image synthesis and editing.

Key features include:

Explicit defocus blur conditioning for precise and continuous bokeh control
Hybrid training pipeline combining real in-the-wild images and synthetic blur augmentations
Decoupled cross-attention modules that preserve semantic content during blur modulation
Grounded self-attention using pivot images to maintain consistent scene layout
Supports real image editing via inversion to adjust bokeh strength post-capture
Generates lens-like bokeh effects without altering underlying scene structure
Outperforms traditional post-processing methods in naturalness and flexibility

Subscribe to the AI Search Newsletter