Key Features

End-to-end vision-language model
Fully differentiable for fine-tuning
Handles tables, receipts, forms, and math notation
Predicts bounding boxes for embedded images
Supports domain adaptation and multilingual fine-tuning
Efficient and fast performance
State-of-the-art performance on OlmOCR-Bench
Versatile tool for document understanding

The model is fully differentiable, allowing for fine-tuning and supporting various tasks such as domain adaptation and multilingual fine-tuning. It can handle tables, receipts, forms, multi-column layouts, and math notation, making it a versatile tool for document understanding. The model also predicts bounding boxes for embedded images, enhancing its functionality.


LightOnOCR-2-1B is part of a model family that includes variants for specific tasks, such as base models for fine-tuning and models with image bounding boxes. The model is available for use with transformers and can be deployed using vLLM. It has been trained on a large and high-quality corpus, resulting in improved performance and efficiency. The model's capabilities make it suitable for a wide range of applications.

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme
TurboType Banner
Zero to AI Engineer Program

Zero to AI Engineer

Skip the degree. Learn real-world AI skills used by AI researchers and engineers. Get certified in 8 weeks or less. No experience required.

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!