OmniPSD: Layered PSD Generation with Diffusion Transformer
Cheng Liu, Yiren Song, Haofan Wang, Mike Zheng Shou
2025-12-11
Summary
This paper introduces OmniPSD, a new system that uses diffusion models to work with PSD files, which are like complex image files with layers. It can both create PSDs from text descriptions and break down existing images into their individual PSD layers.
What's the problem?
Current image generation technology, even advanced diffusion models, struggles with creating or understanding layered image files like PSDs. These files are important for graphic design because they allow for easy editing of individual elements, but maintaining transparency and the relationships between layers is really difficult for computers. Taking a finished image and automatically separating it back into its original layers is also a major challenge.
What's the solution?
The researchers developed OmniPSD, which uses a technique called diffusion modeling within a system called Flux. It essentially learns how layers are arranged in a PSD by looking at examples. When creating a PSD from text, it figures out how different parts of the description should be placed on the canvas. When breaking down an image, it gradually removes parts to reveal the underlying layers. A special component called an RGBA-VAE helps preserve the transparency of the layers during this process, ensuring the layers look correct.
Why it matters?
This work is important because it opens up new possibilities for automated graphic design and image editing. Being able to generate layered designs from text or decompose existing images into editable layers could significantly speed up workflows for designers and allow for new creative tools. It’s a step towards computers understanding and manipulating images at a more sophisticated level, similar to how a human designer would.
Abstract
Recent advances in diffusion models have greatly improved image generation and editing, yet generating or reconstructing layered PSD files with transparent alpha channels remains highly challenging. We propose OmniPSD, a unified diffusion framework built upon the Flux ecosystem that enables both text-to-PSD generation and image-to-PSD decomposition through in-context learning. For text-to-PSD generation, OmniPSD arranges multiple target layers spatially into a single canvas and learns their compositional relationships through spatial attention, producing semantically coherent and hierarchically structured layers. For image-to-PSD decomposition, it performs iterative in-context editing, progressively extracting and erasing textual and foreground components to reconstruct editable PSD layers from a single flattened image. An RGBA-VAE is employed as an auxiliary representation module to preserve transparency without affecting structure learning. Extensive experiments on our new RGBA-layered dataset demonstrate that OmniPSD achieves high-fidelity generation, structural consistency, and transparency awareness, offering a new paradigm for layered design generation and decomposition with diffusion transformers.