jina-embeddings-v5-text: Task-Targeted Embedding Distillation
Mohammad Kalim Akram, Saba Sturua, Nastia Havriushenko, Quentin Herreros, Michael Günther, Maximilian Werk, Han Xiao
2026-02-18
Summary
This paper focuses on creating better text embedding models, which are used to understand the meaning of text for tasks like searching and organizing information.
What's the problem?
Existing methods for training these models often require a lot of computing power or result in models that are too large and slow for some applications. It's difficult to create small, efficient models that still accurately capture the meaning of text, especially longer pieces of text in multiple languages.
What's the solution?
The researchers developed a new training method that combines two techniques: 'distillation,' where a smaller model learns from a larger, more complex one, and 'contrastive loss,' which helps the model learn to distinguish between similar and different pieces of text. By using both together, they were able to create smaller models that perform very well.
Why it matters?
This work is important because it provides a way to build high-quality text embedding models that are small enough to be used on devices with limited resources, like phones or embedded systems. The models they created, jina-embeddings-v5-text-small and jina-embeddings-v5-text-nano, are competitive with larger models and can handle long texts in many languages, making them useful for a wider range of applications. Plus, they've made the models publicly available so others can build upon their work.
Abstract
Text embedding models are widely used for semantic similarity tasks, including information retrieval, clustering, and classification. General-purpose models are typically trained with single- or multi-stage processes using contrastive loss functions. We introduce a novel training regimen that combines model distillation techniques with task-specific contrastive loss to produce compact, high-performance embedding models. Our findings suggest that this approach is more effective for training small models than purely contrastive or distillation-based training paradigms alone. Benchmark scores for the resulting models, jina-embeddings-v5-text-small and jina-embeddings-v5-text-nano, exceed or match the state-of-the-art for models of similar size. jina-embeddings-v5-text models additionally support long texts (up to 32k tokens) in many languages, and generate embeddings that remain robust under truncation and binary quantization. Model weights are publicly available, hopefully inspiring further advances in embedding model development.