DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models

Xiaoxiao He, Ligong Han, Quan Dao, Song Wen, Minhao Bai, Di Liu, Han Zhang, Martin Renqiang Min, Felix Juefei-Xu, Chaowei Tan, Bo Liu, Kang Li, Hongdong Li, Junzhou Huang, Faez Ahmed, Akash Srivastava, Dimitris Metaxas

2024-10-13

DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models

Summary

This paper introduces DICE, a new method that allows for precise editing of images and text generated by models, making it easier to control the content produced by these AI systems.

What's the problem?

While models that generate images and text have become quite advanced, they often struggle when it comes to editing specific parts of the content. Users typically have limited control over the generated results, which can be frustrating if they want to make specific changes or improvements.

What's the solution?

DICE (Discrete Inversion for Controllable Editing) addresses this problem by enabling a precise method for editing generated content. It works by recording noise sequences and masking patterns during the process of creating images or text. This allows users to reconstruct and edit the content without needing predefined masks or complex adjustments. Essentially, DICE gives users the ability to fine-tune specific aspects of the output, like changing colors in an image or altering words in a text, while maintaining high quality.

Why it matters?

This research is significant because it enhances the flexibility and usability of generative AI models. By allowing more control over the editing process, DICE opens up new possibilities for creative applications in fields like graphic design, content creation, and more. This means users can create exactly what they envision without being limited by the original output of the model.

Abstract

Discrete diffusion models have achieved success in tasks like image generation and masked language modeling but face limitations in controlled content editing. We introduce DICE (Discrete Inversion for Controllable Editing), the first approach to enable precise inversion for discrete diffusion models, including multinomial diffusion and masked generative models. By recording noise sequences and masking patterns during the reverse diffusion process, DICE enables accurate reconstruction and flexible editing of discrete data without the need for predefined masks or attention manipulation. We demonstrate the effectiveness of DICE across both image and text domains, evaluating it on models such as VQ-Diffusion, Paella, and RoBERTa. Our results show that DICE preserves high data fidelity while enhancing editing capabilities, offering new opportunities for fine-grained content manipulation in discrete spaces. For project webpage, see https://hexiaoxiao-cs.github.io/DICE/.

View Paper