InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

Tao Han, Wanghan Xu, Junchao Gong, Xiaoyu Yue, Song Guo, Luping Zhou, Lei Bai

2025-09-15

InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

Summary

This paper introduces a new method, called InfGen, for quickly generating images at any resolution, like 4K, without needing a supercomputer.

What's the problem?

Currently, creating high-resolution images with AI models is really slow and requires a lot of computing power. The time it takes to generate an image increases dramatically as the resolution gets higher; generating a 4K image can take over a minute and a half. This makes it impractical for many applications where quick results are needed.

What's the solution?

The researchers realized that existing AI image generators create a smaller, compressed 'idea' of the image first. InfGen takes that compressed idea and quickly expands it into a high-resolution image in a single step, instead of the usual slow, step-by-step process. They replaced a part of the existing image generator, called the decoder, with this faster 'one-step generator'. Importantly, this new system doesn't require retraining the original AI model, making it easy to apply to existing technology.

Why it matters?

InfGen makes it possible to generate high-quality, high-resolution images much faster – reducing 4K image generation to under 10 seconds. This opens up possibilities for using AI image generation in more real-time applications and makes it accessible to people without access to massive computing resources. It also simplifies upgrading existing AI models to handle higher resolutions.

Abstract

Arbitrary resolution image generation provides a consistent visual experience across devices, having extensive applications for producers and consumers. Current diffusion models increase computational demand quadratically with resolution, causing 4K image generation delays over 100 seconds. To solve this, we explore the second generation upon the latent diffusion models, where the fixed latent generated by diffusion models is regarded as the content representation and we propose to decode arbitrary resolution images with a compact generated latent using a one-step generator. Thus, we present the InfGen, replacing the VAE decoder with the new generator, for generating images at any resolution from a fixed-size latent without retraining the diffusion models, which simplifies the process, reducing computational complexity and can be applied to any model using the same latent space. Experiments show InfGen is capable of improving many models into the arbitrary high-resolution era while cutting 4K image generation time to under 10 seconds.

View Paper