Controllable Layer Decomposition for Reversible Multi-Layer Image Generation
Zihao Liu, Zunnan Xu, Shi Shu, Jun Zhou, Ruicheng Zhang, Zhenchao Tang, Xiu Li
2025-11-25
Summary
This paper introduces a new technique called Controllable Layer Decomposition, or CLD, which is a way to break down images into their individual layers, like separating the foreground from the background, but with a lot more control.
What's the problem?
Usually, when you create an image by combining different layers, like in Photoshop, you lose the ability to easily edit those layers *after* they've been combined. Existing methods for trying to separate layers again, like image matting, aren't very precise or easy to control, meaning you can't always get the exact layers you want back. It's like trying to unbake a cake – difficult and messy!
What's the solution?
The researchers developed two main parts to solve this. First, they created something called LayerDecompose-DiT, which is good at figuring out what belongs in each layer and lets you guide the process. Second, they built Multi-Layer Conditional Adapter, which uses information from a target image to help create the layers more accurately. They also created a new set of tests and ways to measure how well their method works.
Why it matters?
This is important because it allows designers to go back and edit individual parts of an image even *after* it's been finalized. The separated layers can be used directly in programs like PowerPoint, making it a practical tool for creative work and fixing mistakes without having to start over from scratch.
Abstract
This work presents Controllable Layer Decomposition (CLD), a method for achieving fine-grained and controllable multi-layer separation of raster images. In practical workflows, designers typically generate and edit each RGBA layer independently before compositing them into a final raster image. However, this process is irreversible: once composited, layer-level editing is no longer possible. Existing methods commonly rely on image matting and inpainting, but remain limited in controllability and segmentation precision. To address these challenges, we propose two key modules: LayerDecompose-DiT (LD-DiT), which decouples image elements into distinct layers and enables fine-grained control; and Multi-Layer Conditional Adapter (MLCA), which injects target image information into multi-layer tokens to achieve precise conditional generation. To enable a comprehensive evaluation, we build a new benchmark and introduce tailored evaluation metrics. Experimental results show that CLD consistently outperforms existing methods in both decomposition quality and controllability. Furthermore, the separated layers produced by CLD can be directly manipulated in commonly used design tools such as PowerPoint, highlighting its practical value and applicability in real-world creative workflows.