EchoDistill: Bidirectional Concept Distillation for One-Step Diffusion Personalization

Yixiong Yang, Tao Wu, Senmao Li, Shiqi Yang, Yaxing Wang, Joost van de Weijer, Kai Wang

2025-10-28

EchoDistill: Bidirectional Concept Distillation for One-Step Diffusion Personalization

Summary

This paper introduces a new method, called EchoDistill, for quickly customizing AI models that create images from text. It focuses on making these models better at understanding and generating pictures of things they haven't specifically been trained on.

What's the problem?

Creating images from text is getting really good, especially with recent advances that let models do it in a single step. However, these fast, one-step models struggle to learn and accurately represent completely new ideas or concepts. They don't have enough 'capacity' to easily incorporate new information, making personalization difficult.

What's the solution?

The researchers developed EchoDistill, which uses two models working together: a more complex, multi-step model (the 'teacher') and a faster, one-step model (the 'student'). The student learns from the teacher, then 'echoes' information back to help the teacher improve too. Both models share the same understanding of text, and the student is further refined using techniques that make its images look realistic and consistent with the teacher's style. The key is this back-and-forth learning process, where the student’s speed helps the teacher, and the teacher’s knowledge helps the student.

Why it matters?

This research is important because it provides a way to rapidly and effectively personalize image generation models. This means you can quickly teach an AI to create images of very specific things, even if it hasn't seen them before, without sacrificing image quality. It opens up possibilities for more customized and creative image generation applications.

Abstract

Recent advances in accelerating text-to-image (T2I) diffusion models have enabled the synthesis of high-fidelity images even in a single step. However, personalizing these models to incorporate novel concepts remains a challenge due to the limited capacity of one-step models to capture new concept distributions effectively. We propose a bidirectional concept distillation framework, EchoDistill, to enable one-step diffusion personalization (1-SDP). Our approach involves an end-to-end training process where a multi-step diffusion model (teacher) and a one-step diffusion model (student) are trained simultaneously. The concept is first distilled from the teacher model to the student, and then echoed back from the student to the teacher. During the EchoDistill, we share the text encoder between the two models to ensure consistent semantic understanding. Following this, the student model is optimized with adversarial losses to align with the real image distribution and with alignment losses to maintain consistency with the teacher's output. Furthermore, we introduce the bidirectional echoing refinement strategy, wherein the student model leverages its faster generation capability to feedback to the teacher model. This bidirectional concept distillation mechanism not only enhances the student ability to personalize novel concepts but also improves the generative quality of the teacher model. Our experiments demonstrate that this collaborative framework significantly outperforms existing personalization methods over the 1-SDP setup, establishing a novel paradigm for rapid and effective personalization in T2I diffusion models.

View Paper