LayerComposer: Interactive Personalized T2I via Spatially-Aware Layered Canvas

Guocheng Gordon Qian, Ruihang Zhang, Tsai-Shien Chen, Yusuf Dalva, Anujraaj Argo Goyal, Willi Menapace, Ivan Skorokhodov, Meng Dong, Arpit Sahni, Daniil Ostashev, Ju Hu, Sergey Tulyakov, Kuan-Chieh Jackson Wang

2025-10-24

LayerComposer: Interactive Personalized T2I via Spatially-Aware Layered Canvas

Summary

This paper introduces LayerComposer, a new system for creating personalized images from text descriptions, especially when those images need to include multiple people or objects.

What's the problem?

Current AI image generators are really good at making realistic pictures, but they struggle when you want precise control over *where* things are placed in the image and how they relate to each other, especially if you want several different people or objects in the same scene. They also have trouble keeping the details of each person or object consistent when you're trying to arrange them.

What's the solution?

LayerComposer works like a digital art program with layers. Each person or object gets its own layer, so you can move and resize them without messing up the others. It also has a 'locking' feature – you can lock a layer to keep it exactly as it is while letting the AI adjust the other layers to fit everything together naturally. This is done by cleverly using how the AI already understands positions in images and a new way of showing it examples during training, without needing to change the core AI model itself.

Why it matters?

This is important because it gives users much more creative control over AI-generated images, making it possible to create complex scenes with multiple subjects that look exactly how you want them to. It’s a step towards making AI image generation more useful for things like design and storytelling where precise control is key.

Abstract

Despite their impressive visual fidelity, existing personalized generative models lack interactive control over spatial composition and scale poorly to multiple subjects. To address these limitations, we present LayerComposer, an interactive framework for personalized, multi-subject text-to-image generation. Our approach introduces two main contributions: (1) a layered canvas, a novel representation in which each subject is placed on a distinct layer, enabling occlusion-free composition; and (2) a locking mechanism that preserves selected layers with high fidelity while allowing the remaining layers to adapt flexibly to the surrounding context. Similar to professional image-editing software, the proposed layered canvas allows users to place, resize, or lock input subjects through intuitive layer manipulation. Our versatile locking mechanism requires no architectural changes, relying instead on inherent positional embeddings combined with a new complementary data sampling strategy. Extensive experiments demonstrate that LayerComposer achieves superior spatial control and identity preservation compared to the state-of-the-art methods in multi-subject personalized image generation.

View Paper