Native-Resolution Image Synthesis (NiT)

Paid Graphics Image Synthesis

LikeWebsite Promote

Key Features

Native-resolution image synthesis

Arbitrary resolution and aspect ratio generation

Dynamic Tokenization

Variable-Length Sequence Processing

2D Structural Prior Injection

Flash Attention

State-of-the-art performance on ImageNet benchmarks

Strong zero-shot generalization ability

NiT introduces three key architectural innovations: Dynamic Tokenization, Variable-Length Sequence Processing, and 2D Structural Prior Injection. Dynamic Tokenization converts images into variable-length token sequences, avoiding input padding and reducing computational overhead. Variable-Length Sequence Processing uses Flash Attention to process heterogeneous token sequences, while 2D Structural Prior Injection introduces axial 2D Rotary Positional Embedding to factorize height and width impact. These innovations enable NiT to efficiently process images of varying resolutions and aspect ratios.

NiT has demonstrated state-of-the-art performance on both ImageNet-256x256 and 512x512 benchmarks, achieving FID scores of 2.03 and 1.45, respectively. Moreover, NiT exhibits strong zero-shot generalization ability, with a FID score of 4.52 on unseen 1024x1024 resolution. NiT also outperforms baselines on resolution generalization and aspect ratio generalization, demonstrating its ability to generate high-quality images across diverse resolutions and aspect ratios. These results make NiT a valuable tool for various applications, including image synthesis, image editing, and computer vision.

Get more likes & reach the top of search results by adding this button on your site!

Native-Resolution Image Synthesis (NiT)

Key Features

Zero to AI Engineer

Subscribe to the AI Search Newsletter