TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval
Özay Ezerceli, Mahmoud El Hussieni, Selva Taş, Reyhan Bayraktar, Fatma Betül Terzioğlu, Yusuf Çelebi, Yağız Asker
2025-11-21
Summary
This paper investigates how well different methods for finding information work with the Turkish language, which is complex and doesn't have as many online resources as languages like English.
What's the problem?
Current information retrieval systems often perform well in languages with lots of data, but struggle with languages like Turkish that have a lot of grammatical rules and fewer digital resources. While simpler 'dense' methods are commonly used for Turkish search, more detailed 'late-interaction' methods haven't been thoroughly tested to see if they can be better, especially considering their potential to be smaller and faster.
What's the solution?
The researchers created a new benchmark called TurkColBERT to compare these different approaches. They started with existing models trained on English and other languages, then adapted them to understand Turkish by training them on Turkish language understanding tasks. They then converted these models into a specific type of late-interaction model called ColBERT, using a technique called PyLate and a Turkish dataset called MS MARCO-TR. They tested ten different models on five different Turkish datasets covering topics like science, finance, and arguments, and also compared different ways to quickly index and search through the data.
Why it matters?
This work is important because it shows that late-interaction models can be surprisingly effective for Turkish information retrieval, even with limited resources. They found that smaller models could perform as well as or even better than much larger 'dense' models, which is significant for making search faster and more efficient. The released tools and datasets will help other researchers improve search technology for Turkish and other similar languages.
Abstract
Neural information retrieval systems excel in high-resource languages but remain underexplored for morphologically rich, lower-resource languages such as Turkish. Dense bi-encoders currently dominate Turkish IR, yet late-interaction models -- which retain token-level representations for fine-grained matching -- have not been systematically evaluated. We introduce TurkColBERT, the first comprehensive benchmark comparing dense encoders and late-interaction models for Turkish retrieval. Our two-stage adaptation pipeline fine-tunes English and multilingual encoders on Turkish NLI/STS tasks, then converts them into ColBERT-style retrievers using PyLate trained on MS MARCO-TR. We evaluate 10 models across five Turkish BEIR datasets covering scientific, financial, and argumentative domains. Results show strong parameter efficiency: the 1.0M-parameter colbert-hash-nano-tr is 600times smaller than the 600M turkish-e5-large dense encoder while preserving over 71\% of its average mAP. Late-interaction models that are 3--5times smaller than dense encoders significantly outperform them; ColmmBERT-base-TR yields up to +13.8\% mAP on domain-specific tasks. For production-readiness, we compare indexing algorithms: MUVERA+Rerank is 3.33times faster than PLAID and offers +1.7\% relative mAP gain. This enables low-latency retrieval, with ColmmBERT-base-TR achieving 0.54 ms query times under MUVERA. We release all checkpoints, configs, and evaluation scripts. Limitations include reliance on moderately sized datasets (leq50K documents) and translated benchmarks, which may not fully reflect real-world Turkish retrieval conditions; larger-scale MUVERA evaluations remain necessary.