UltraHR-100K: Enhancing UHR Image Synthesis with A Large-Scale High-Quality Dataset

Chen Zhao, En Ci, Yunzhe Xu, Tiehan Fan, Shanyan Guan, Yanhao Ge, Jian Yang, Ying Tai

2025-10-29

UltraHR-100K: Enhancing UHR Image Synthesis with A Large-Scale High-Quality Dataset

Summary

This paper focuses on improving the creation of really detailed, high-resolution images from text descriptions using artificial intelligence.

What's the problem?

Creating ultra-high-resolution images from text is difficult because there aren't enough large collections of high-quality images to train the AI, and existing training methods don't focus enough on generating those tiny, important details that make images look realistic at such high resolutions.

What's the solution?

The researchers created a new dataset called UltraHR-100K, which contains 100,000 very detailed images, each over 3000 pixels in size, along with descriptions of what's in the images. They also developed a new training technique that specifically helps the AI focus on creating fine details by carefully controlling how it learns at different stages and by using a mathematical tool called Discrete Fourier Transform to preserve high-frequency information, which is essential for detail.

Why it matters?

This work is important because it pushes the boundaries of what's possible with AI image generation, allowing for the creation of incredibly realistic and detailed images from text. This has potential applications in areas like art, design, and scientific visualization where high-quality imagery is crucial.

Abstract

Ultra-high-resolution (UHR) text-to-image (T2I) generation has seen notable progress. However, two key challenges remain : 1) the absence of a large-scale high-quality UHR T2I dataset, and (2) the neglect of tailored training strategies for fine-grained detail synthesis in UHR scenarios. To tackle the first challenge, we introduce UltraHR-100K, a high-quality dataset of 100K UHR images with rich captions, offering diverse content and strong visual fidelity. Each image exceeds 3K resolution and is rigorously curated based on detail richness, content complexity, and aesthetic quality. To tackle the second challenge, we propose a frequency-aware post-training method that enhances fine-detail generation in T2I diffusion models. Specifically, we design (i) Detail-Oriented Timestep Sampling (DOTS) to focus learning on detail-critical denoising steps, and (ii) Soft-Weighting Frequency Regularization (SWFR), which leverages Discrete Fourier Transform (DFT) to softly constrain frequency components, encouraging high-frequency detail preservation. Extensive experiments on our proposed UltraHR-eval4K benchmarks demonstrate that our approach significantly improves the fine-grained detail quality and overall fidelity of UHR image generation. The code is available at https://github.com/NJU-PCALab/UltraHR-100k{here}.

View Paper