A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space

Huijie Liu, Shuhao Cui, Haoxiang Cao, Shuai Ma, Kai Wu, Guoliang Kang

2025-11-19

A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space

Summary

This paper introduces a new way to create images with different artistic styles using just a single number as a 'style code'. It's about making it easier to generate unique and consistent visuals without needing long descriptions or example images.

What's the problem?

Currently, creating images with specific styles using computers is difficult. Existing methods require either very detailed text instructions, example images to copy from, or a lot of adjustments to the underlying image generation model. These methods often struggle to maintain a consistent style throughout an image, limit the range of styles possible, and can be complicated to use because representing a style is complex.

What's the solution?

The researchers developed a system called CoTyle. It works in two main steps: first, it learns to represent different styles as numerical codes by analyzing a collection of images. Then, it uses these codes to control a powerful image generator, telling it what style to use. A separate part of the system can even create *new* style codes, allowing for completely original styles. Essentially, they've shown that you can define a visual style with just one number, and the computer can then create images based on that number.

Why it matters?

This research is important because it's the first publicly available method for controlling image style with simple numerical codes. Previously, this kind of technology was mostly kept secret within companies like Midjourney. By making this technology open-source, the researchers are allowing other scientists and artists to build upon their work and explore new possibilities in image generation, potentially leading to more creative tools and easier ways to express artistic vision.

Abstract

Innovative visual stylization is a cornerstone of artistic creation, yet generating novel and consistent visual styles remains a significant challenge. Existing generative approaches typically rely on lengthy textual prompts, reference images, or parameter-efficient fine-tuning to guide style-aware image generation, but often struggle with style consistency, limited creativity, and complex style representations. In this paper, we affirm that a style is worth one numerical code by introducing the novel task, code-to-style image generation, which produces images with novel, consistent visual styles conditioned solely on a numerical style code. To date, this field has only been primarily explored by the industry (e.g., Midjourney), with no open-source research from the academic community. To fill this gap, we propose CoTyle, the first open-source method for this task. Specifically, we first train a discrete style codebook from a collection of images to extract style embeddings. These embeddings serve as conditions for a text-to-image diffusion model (T2I-DM) to generate stylistic images. Subsequently, we train an autoregressive style generator on the discrete style embeddings to model their distribution, allowing the synthesis of novel style embeddings. During inference, a numerical style code is mapped to a unique style embedding by the style generator, and this embedding guides the T2I-DM to generate images in the corresponding style. Unlike existing methods, our method offers unparalleled simplicity and diversity, unlocking a vast space of reproducible styles from minimal input. Extensive experiments validate that CoTyle effectively turns a numerical code into a style controller, demonstrating a style is worth one code.

View Paper