DiffSensei

NEW

Free Graphics Manga Generation

LikeWebsite Promote

Key Features

Dynamic multi-character control

Diffusion-based image generator

Multimodal large language model (MLLM)

Text-compatible identity adapter

Masked cross-attention for character feature incorporation

Layout control without direct pixel transfer

Flexible adjustments in character expressions, poses, and actions

Large-scale dataset (MangaZero) for training and evaluation

The DiffSensei framework consists of two stages. In the first stage, a multi-character customized manga image generation model with layout control is trained. The dialog embedding is added to the noised latent after the first convolution layer, and all the parameters in the U-Net and feature extractor are trained. In the second stage, the LoRA and resampler weights of an MLLM are fine-tuned to adapt the source character features corresponding to the text prompt. The model in the first stage is used as the image generator, and its weights are frozen.

DiffSensei is accompanied by a large-scale dataset called MangaZero, which contains 43,264 manga pages and 427,147 annotated panels. This dataset supports the visualization of varied character interactions and movements across sequential frames. Extensive experiments demonstrate that DiffSensei outperforms existing models, marking a significant advancement in manga generation by enabling text-adaptable character customization. The code, model, and dataset will be open-sourced to the community, allowing for further development and research in this area.

Get more likes & reach the top of search results by adding this button on your site!

DiffSensei

Key Features

Subscribe to the AI Search Newsletter