TokBench: Evaluating Your Visual Tokenizer before Visual Generation

Junfeng Wu, Dongliang Luo, Weizhi Zhao, Zhihao Xie, Yuanhao Wang, Junyi Li, Xudong Xie, Yuliang Liu, Xiang Bai

2025-05-30

TokBench: Evaluating Your Visual Tokenizer before Visual Generation

Summary

This paper talks about TokBench, a new way to test how well AI tools called visual tokenizers and VAEs can break down and then rebuild detailed images, especially when it comes to things like text and faces.

What's the problem?

The problem is that when AI tries to turn images into data and then recreate them, it often loses important details, which can make the results blurry or inaccurate, especially for things that need to be very precise like written words or facial features.

What's the solution?

The researchers used TokBench to carefully measure how good these AI tools are at keeping all the small details when reconstructing images. They found that current methods have clear weaknesses and that better ways to judge their performance are needed.

Why it matters?

This is important because if AI can do a better job at preserving details in images, it will make everything from photo editing to face recognition and digital art much more accurate and useful.

Abstract

Evaluation of visual tokenizers and VAEs on fine-grained feature reconstruction, focusing on text and face, reveals limitations in preserving detailed visual content and highlights the need for specialized metrics.

View Paper