Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration

Danil Tokhchukov, Aysel Mirzoeva, Andrey Kuznetsov, Konstantin Sobolev

2026-03-28

Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration

Summary

This paper explores how to make Diffusion Transformers, which are used for creating things like images from text, work even better.

What's the problem?

Diffusion Transformers are powerful, but they aren't always as efficient or high-quality as they could be. The way they 'clean up' data during the generation process isn't perfectly tuned, leading to potentially blurry or less detailed results and requiring many steps to create a final image.

What's the solution?

The researchers discovered that adding a single, adjustable setting to the core building blocks of these transformers – called DiT blocks – can dramatically improve performance. They developed a method called Calibri that automatically finds the best setting for this adjustment using a clever technique inspired by evolution. Calibri only changes a small number of settings, about 100, making it very efficient.

Why it matters?

Calibri is important because it boosts the quality of images generated by Diffusion Transformers without needing to make the models much larger or more complex. It also allows these models to create good images with fewer steps, which means faster generation times. This makes creating high-quality images more accessible and efficient.

Abstract

In this paper, we uncover the hidden potential of Diffusion Transformers (DiTs) to significantly enhance generative tasks. Through an in-depth analysis of the denoising process, we demonstrate that introducing a single learned scaling parameter can significantly improve the performance of DiT blocks. Building on this insight, we propose Calibri, a parameter-efficient approach that optimally calibrates DiT components to elevate generative quality. Calibri frames DiT calibration as a black-box reward optimization problem, which is efficiently solved using an evolutionary algorithm and modifies just ~100 parameters. Experimental results reveal that despite its lightweight design, Calibri consistently improves performance across various text-to-image models. Notably, Calibri also reduces the inference steps required for image generation, all while maintaining high-quality outputs.

View Paper