SliderSpace: Decomposing the Visual Capabilities of Diffusion Models
Rohit Gandikota, Zongze Wu, Richard Zhang, David Bau, Eli Shechtman, Nick Kolkin
2025-02-04
Summary
This paper talks about SliderSpace, a new tool that helps AI systems better understand and control the way they create images from text descriptions. It makes it easier to adjust and explore different visual styles, concepts, and variations in the images generated by diffusion models.
What's the problem?
Diffusion models, which are used to generate images from text, often struggle with controlling how specific features of an image are created. Current methods require users to manually define each feature they want to change, which is time-consuming and not very flexible. This makes it harder for users to explore the full creative potential of these models or make precise adjustments to the images they generate.
What's the solution?
The researchers developed SliderSpace, a framework that automatically breaks down the visual capabilities of diffusion models into human-understandable controls. Instead of needing users to define each feature manually, SliderSpace discovers multiple ways to adjust an image from just one text prompt. It uses a method called low-rank adaptors to make these changes efficient and easy to combine. SliderSpace was tested on tasks like exploring artistic styles, breaking down complex concepts, and increasing the variety of generated images. The results showed that it outperformed older methods by producing more diverse and useful image variations.
Why it matters?
This research is important because it makes AI tools for image generation more accessible and powerful. By giving users better control over how images are created, SliderSpace opens up new possibilities for art, design, and other creative fields. It also helps researchers learn more about how diffusion models work, which could lead to further improvements in AI technology. This advancement could be especially useful in areas like marketing, education, and entertainment, where creating high-quality visuals quickly is essential.
Abstract
We present SliderSpace, a framework for automatically decomposing the visual capabilities of diffusion models into controllable and human-understandable directions. Unlike existing control methods that require a user to specify attributes for each edit direction individually, SliderSpace discovers multiple interpretable and diverse directions simultaneously from a single text prompt. Each direction is trained as a low-rank adaptor, enabling compositional control and the discovery of surprising possibilities in the model's latent space. Through extensive experiments on state-of-the-art diffusion models, we demonstrate SliderSpace's effectiveness across three applications: concept decomposition, artistic style exploration, and diversity enhancement. Our quantitative evaluation shows that SliderSpace-discovered directions decompose the visual structure of model's knowledge effectively, offering insights into the latent capabilities encoded within diffusion models. User studies further validate that our method produces more diverse and useful variations compared to baselines. Our code, data and trained weights are available at https://sliderspace.baulab.info