Key Features

End-to-end OCR pipeline combining detection, recognition, parsing, and translation
Lightweight architecture with 1 billion parameters for efficient GPU deployment
Multilingual support for over 100 languages and mixed-language documents
Complex document parsing with tables, formulas, and structured layouts
Preservation of reading order and native aspect ratio for long or dense pages
Adaptive token compression for focusing on text-heavy regions

With its lightweight architecture, Hunyuan OCR runs efficiently on modern GPUs, making it practical for both research and production environments. The model supports both vLLM and Transformers inference paths, offering flexibility for developers to optimize throughput, latency, or custom operations. Its native multimodal design enables simultaneous understanding of visual and textual information, resulting in superior performance on tasks involving structured and unstructured content, such as mathematical expressions in LaTeX format and complex tables in HTML.


Hunyuan OCR demonstrates robust multilingual capabilities, supporting over 100 languages and handling mixed-language documents with high accuracy. It performs well on diverse real-world tasks, including document parsing, subtitle extraction, photo translation, and invoice field extraction. The model's adaptive token compression and native resolution encoding preserve fine details and aspect ratios, ensuring clean OCR results on long receipts, dense pages, and low-quality scans. Its open-source nature and strong benchmark scores make it a leading choice for OCR applications requiring efficiency and versatility.

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme
TurboType Banner

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!