μ^2Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation

Siyou Li, Pengyao Qin, Huanan Wu, Dong Nie, Arun J. Thirunavukarasu, Juntao Yu, Le Zhang

2025-07-03

μ^2Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for
Radiology Report Generation

Summary

This paper talks about μ²Tokenizer, a special tool that helps large language models understand and generate radiology reports by combining information from medical images and text. It improves how accurate and useful the generated reports are by using different scales of information and optimizing them based on preferences.

What's the problem?

The problem is that creating automated radiology reports is challenging because the system needs to understand complex medical images and translate that understanding into clear and accurate text reports, which requires integrating visual and language information effectively.

What's the solution?

The researchers designed μ²Tokenizer, which works at multiple scales and combines both image features and text features. By using a learning method that directly focuses on improving report quality according to preferences, the system can generate better radiology reports without extra tuning.

Why it matters?

This matters because it can help doctors by generating more reliable and detailed radiology reports quickly, reducing their workload and improving patient care through faster and clearer medical diagnoses.

Abstract

A multiscale multimodal large language model, $\mu^2$LLM, enhances automated radiology report generation by integrating visual and textual features and optimizing report quality through direct preference optimization.

View Paper