Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search from Task-Centric Views

Tingyang Chen, Cong Fu, Jiahua Wu, Haotian Wu, Hua Fan, Xiangyu Ke, Yunjun Gao, Yabo Ni, Anxiang Zeng

2025-12-17

Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search from Task-Centric Views

Summary

This paper introduces Iceberg, a new way to test how well vector similarity search (VSS) works, but instead of just looking at speed and finding things that are *close* mathematically, it focuses on how well VSS actually helps with real-world tasks like image searches or recommendations.

What's the problem?

Currently, tests for VSS mainly check how quickly they find similar vectors and how many of the truly similar ones they retrieve. However, this doesn't tell us if those 'similar' vectors actually help with the bigger picture – like correctly classifying an image or suggesting a product someone will like. The paper argues that focusing only on mathematical similarity can be misleading because it ignores problems that happen when converting data into vectors, when the distance between vectors doesn't really match how relevant things are, and when the data itself is unevenly distributed.

What's the solution?

The researchers created Iceberg, a comprehensive testing suite that looks at VSS performance within complete applications. They identified three key areas where things can go wrong – embedding quality, how well distance metrics reflect real-world relevance, and how sensitive the search is to the type of data. They tested this on eight different datasets, ranging in size, covering things like images, text, and recommendations, and compared 13 different VSS methods. They then ranked these methods based on how well they performed in the actual applications, not just on speed and recall.

Why it matters?

Iceberg is important because it provides a more realistic evaluation of VSS methods. It shows that the methods that look best based on traditional tests aren't always the best for specific tasks. The researchers also created a guide to help people choose the right VSS method for their needs, based on the characteristics of their data and application, leading to better performance in real-world scenarios.

Abstract

Vector Similarity Search (VSS) in high-dimensional spaces is rapidly emerging as core functionality in next-generation database systems for numerous data-intensive services -- from embedding lookups in large language models (LLMs), to semantic information retrieval and recommendation engines. Current benchmarks, however, evaluate VSS primarily on the recall-latency trade-off against a ground truth defined solely by distance metrics, neglecting how retrieval quality ultimately impacts downstream tasks. This disconnect can mislead both academic research and industrial practice. We present Iceberg, a holistic benchmark suite for end-to-end evaluation of VSS methods in realistic application contexts. From a task-centric view, Iceberg uncovers the Information Loss Funnel, which identifies three principal sources of end-to-end performance degradation: (1) Embedding Loss during feature extraction; (2) Metric Misuse, where distances poorly reflect task relevance; (3) Data Distribution Sensitivity, highlighting index robustness across skews and modalities. For a more comprehensive assessment, Iceberg spans eight diverse datasets across key domains such as image classification, face recognition, text retrieval, and recommendation systems. Each dataset, ranging from 1M to 100M vectors, includes rich, task-specific labels and evaluation metrics, enabling assessment of retrieval algorithms within the full application pipeline rather than in isolation. Iceberg benchmarks 13 state-of-the-art VSS methods and re-ranks them based on application-level metrics, revealing substantial deviations from traditional rankings derived purely from recall-latency evaluations. Building on these insights, we define a set of task-centric meta-features and derive an interpretable decision tree to guide practitioners in selecting and tuning VSS methods for their specific workloads.

View Paper