Subject-Consistent and Pose-Diverse Text-to-Image Generation

Zhanxin Gao, Beier Zhu, Liang Yao, Jian Yang, Ying Tai

2025-07-15

Subject-Consistent and Pose-Diverse Text-to-Image Generation

Summary

This paper talks about CoDi, a new way to create images from text that keeps the main subject the same while changing its pose or position. It does this by first moving the identity of the subject and then refining it with a special process called diffusion.

What's the problem?

The problem is that when AI tries to make pictures based on text descriptions, it often changes the subject too much or can’t show the subject in different positions clearly while keeping it recognizable.

What's the solution?

CoDi solves this by using a two-step approach that first transports the subject’s identity to new poses and then refines the image using diffusion, a technique that gradually improves the image quality. This helps the model create pictures that both look realistic and keep the subject consistent even when it moves.

Why it matters?

This matters because it helps make better and more flexible AI-generated images that can show the same character or object in many different ways without losing what makes it unique, which is useful for creative tasks like animation and design.

Abstract

CoDi, a two-stage text-to-image framework, maintains subject consistency while enhancing pose diversity through identity transport and refinement using diffusion.

View Paper