I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow
Ruoyi Du, Dongyang Liu, Le Zhuo, Qin Qi, Hongsheng Li, Zhanyu Ma, Peng Gao
2024-10-14

Summary
This paper discusses I-Max, a new framework designed to improve the resolution of images generated by pre-trained Rectified Flow Transformers (RFTs) without needing extensive retraining.
What's the problem?
While Rectified Flow Transformers are efficient for training and generating images, they have limitations in producing high-resolution images due to challenges like poor data quality and high training costs. Existing methods to enhance resolution often lead to instability in image generation, making it hard to apply these techniques effectively.
What's the solution?
I-Max introduces a two-part approach: it uses a new method called Projected Flow for stable resolution enhancement and provides an advanced toolkit that helps the model apply its knowledge to generate higher-resolution images. This framework allows for better control over the generation process, leading to clearer and more detailed images without sacrificing stability.
Why it matters?
This research is significant because it pushes the boundaries of what can be achieved with image generation models. By improving how these models handle resolution, I-Max can contribute to advancements in fields like computer graphics, virtual reality, and any application where high-quality images are crucial.
Abstract
Rectified Flow Transformers (RFTs) offer superior training and inference efficiency, making them likely the most viable direction for scaling up diffusion models. However, progress in generation resolution has been relatively slow due to data quality and training costs. Tuning-free resolution extrapolation presents an alternative, but current methods often reduce generative stability, limiting practical application. In this paper, we review existing resolution extrapolation methods and introduce the I-Max framework to maximize the resolution potential of Text-to-Image RFTs. I-Max features: (i) a novel Projected Flow strategy for stable extrapolation and (ii) an advanced inference toolkit for generalizing model knowledge to higher resolutions. Experiments with Lumina-Next-2K and Flux.1-dev demonstrate I-Max's ability to enhance stability in resolution extrapolation and show that it can bring image detail emergence and artifact correction, confirming the practical value of tuning-free resolution extrapolation.