UniX: Unifying Autoregression and Diffusion for Chest X-Ray Understanding and Generation

Ruiheng Zhang, Jingfeng Yao, Huangxuan Zhao, Hao Yan, Xiao He, Lei Chen, Zhou Wei, Yong Luo, Zengmao Wang, Lefei Zhang, Dacheng Tao, Bo Du

2026-01-21

UniX: Unifying Autoregression and Diffusion for Chest X-Ray Understanding and Generation

Summary

This paper introduces UniX, a new artificial intelligence model designed to both understand and create chest X-ray images. It aims to be good at both tasks, which is something current models struggle with.

What's the problem?

Existing AI models for medical images often have to choose between being good at *understanding* what’s in an X-ray (like identifying diseases) or being good at *generating* realistic X-ray images. This is because the skills needed for each are different – understanding needs to focus on key features, while generating needs to recreate every detail. Models that try to do both usually end up being mediocre at one or both.

What's the solution?

The researchers built UniX with two separate parts: one part uses a method called autoregressive modeling to focus on understanding the X-ray, and another part uses a method called diffusion modeling to create high-quality images. They then connected these parts using a special mechanism that lets the understanding part guide the image creation process. They also carefully cleaned the data used to train the model and used a smart training process to help the two parts work together effectively.

Why it matters?

UniX performs as well as models specifically designed for *either* understanding or generating X-rays, but it can do *both*. This is a big step forward because it means we can potentially build more versatile and efficient AI systems for medical imaging, and it does so using fewer resources than other large models.

Abstract

Despite recent progress, medical foundation models still struggle to unify visual understanding and generation, as these tasks have inherently conflicting goals: semantic abstraction versus pixel-level reconstruction. Existing approaches, typically based on parameter-shared autoregressive architectures, frequently lead to compromised performance in one or both tasks. To address this, we present UniX, a next-generation unified medical foundation model for chest X-ray understanding and generation. UniX decouples the two tasks into an autoregressive branch for understanding and a diffusion branch for high-fidelity generation. Crucially, a cross-modal self-attention mechanism is introduced to dynamically guide the generation process with understanding features. Coupled with a rigorous data cleaning pipeline and a multi-stage training strategy, this architecture enables synergistic collaboration between tasks while leveraging the strengths of diffusion models for superior generation. On two representative benchmarks, UniX achieves a 46.1% improvement in understanding performance (Micro-F1) and a 24.2% gain in generation quality (FD-RadDino), using only a quarter of the parameters of LLM-CXR. By achieving performance on par with task-specific models, our work establishes a scalable paradigm for synergistic medical image understanding and generation. Codes and models are available at https://github.com/ZrH42/UniX.

View Paper