ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting

Chengyou Jia, Changliang Xia, Zhuohang Dang, Weijia Wu, Hangwei Qian, Minnan Luo

2024-11-29

ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting

Summary

This paper presents ChatGen, a system that automates the process of generating images from text by allowing users to describe their needs in a casual chatting style, making it easier and faster to create desired images.

What's the problem?

Creating images from text descriptions can be complicated and frustrating for users because they often have to go through many steps, like writing detailed prompts and selecting the right models. This trial-and-error process can be time-consuming and doesn't always lead to the best results.

What's the solution?

The authors propose a method called Automatic Text-to-Image (T2I) generation that simplifies this process. They introduce ChatGenBench, a benchmark that helps evaluate how well automatic T2I systems work using high-quality data from casual user inputs. Additionally, they develop ChatGen-Evo, a multi-stage strategy that gradually teaches the model how to automate the image generation process effectively. This approach improves both the accuracy of the generated images and the overall user experience.

Why it matters?

This research is important because it makes it much easier for people to create images from text without needing technical knowledge or going through complicated steps. By streamlining the image generation process, ChatGen can benefit various fields, including art, marketing, and education, allowing more people to express their ideas visually.

Abstract

Despite the significant advancements in text-to-image (T2I) generative models, users often face a trial-and-error challenge in practical scenarios. This challenge arises from the complexity and uncertainty of tedious steps such as crafting suitable prompts, selecting appropriate models, and configuring specific arguments, making users resort to labor-intensive attempts for desired images. This paper proposes Automatic T2I generation, which aims to automate these tedious steps, allowing users to simply describe their needs in a freestyle chatting way. To systematically study this problem, we first introduce ChatGenBench, a novel benchmark designed for Automatic T2I. It features high-quality paired data with diverse freestyle inputs, enabling comprehensive evaluation of automatic T2I models across all steps. Additionally, recognizing Automatic T2I as a complex multi-step reasoning task, we propose ChatGen-Evo, a multi-stage evolution strategy that progressively equips models with essential automation skills. Through extensive evaluation across step-wise accuracy and image quality, ChatGen-Evo significantly enhances performance over various baselines. Our evaluation also uncovers valuable insights for advancing automatic T2I. All our data, code, and models will be available in https://chengyou-jia.github.io/ChatGen-Home

View Paper