MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping

Junyao Gao, Sibo Liu, Jiaxing Li, Yanan Sun, Yuanpeng Tu, Fei Shen, Weidong Zhang, Cairong Zhao, Jun Zhang

2026-04-10

MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping

Summary

This paper introduces MegaStyle, a new way to create a huge collection of images that are all styled consistently within each style, but very different from each other across styles, and generally high quality.

What's the problem?

Creating datasets for training computers to understand and change image styles is hard. Existing datasets often have inconsistencies within a single style – meaning images labeled as 'watercolor' don't all *look* like they were painted with watercolors. They also lack diversity, meaning there aren't enough different styles represented, and the images themselves aren't always very good quality.

What's the solution?

The researchers used powerful AI image generators that can create images in a specific style based on a text description. They built a large collection of 170,000 style descriptions (like 'pixel art' or 'photorealistic') and 400,000 descriptions of *what* to draw (like 'a cat' or 'a landscape'). Then, they combined these to automatically generate over 1.4 million images. They then used this dataset, called MegaStyle-1.4M, to train two new AI models: one to understand styles and another to transfer styles from one image to another.

Why it matters?

MegaStyle provides a much better dataset for style transfer tasks. The models trained on it are better at recognizing how similar different styles are and at applying styles to new images in a way that looks natural and consistent. This is a big step forward for anyone working on AI that manipulates image styles, like creating art or editing photos.

Abstract

In this paper, we introduce MegaStyle, a novel and scalable data curation pipeline that constructs an intra-style consistent, inter-style diverse and high-quality style dataset. We achieve this by leveraging the consistent text-to-image style mapping capability of current large generative models, which can generate images in the same style from a given style description. Building on this foundation, we curate a diverse and balanced prompt gallery with 170K style prompts and 400K content prompts, and generate a large-scale style dataset MegaStyle-1.4M via content-style prompt combinations. With MegaStyle-1.4M, we propose style-supervised contrastive learning to fine-tune a style encoder MegaStyle-Encoder for extracting expressive, style-specific representations, and we also train a FLUX-based style transfer model MegaStyle-FLUX. Extensive experiments demonstrate the importance of maintaining intra-style consistency, inter-style diversity and high-quality for style dataset, as well as the effectiveness of the proposed MegaStyle-1.4M. Moreover, when trained on MegaStyle-1.4M, MegaStyle-Encoder and MegaStyle-FLUX provide reliable style similarity measurement and generalizable style transfer, making a significant contribution to the style transfer community. More results are available at our project website https://jeoyal.github.io/MegaStyle/.

View Paper