Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings

Liyan Xu, Zhenlin Su, Mo Yu, Jiangnan Li, Fandong Meng, Jie Zhou

2025-06-16

Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity
Dilemma of Embeddings

Summary

This paper talks about CapRetrieval, a new dataset designed to test how well text encoders used in dense retrieval systems can recognize very detailed or fine-grained parts of text, like specific objects, people, or events mentioned inside image captions. The researchers found that dense retrievers, which are tools that find information by matching meaning in text rather than exact words, often fail even on simple queries because they struggle to focus on these small but important details.

What's the problem?

The problem is that dense retrievers create embeddings, which are compact representations of text, but these embeddings sometimes lose the fine details that distinguish one entity or event from another. This causes the retrievers to give wrong or incomplete results when asked about specific details, which is a big issue for tasks like searching images by detailed descriptions or questions about precise parts of text.

What's the solution?

To solve this, the researchers made the CapRetrieval dataset with passages that are image captions and queries that ask about entities or events in different ways. They tested existing text encoders on this dataset and found many failed at capturing fine-grained information. To improve performance, they proposed new ways to generate training data and fine-tune the encoders so that the embeddings better represent important small details within the text, leading to more accurate retrieval results.

Why it matters?

This matters because dense retrieval is widely used in systems that need to quickly find relevant information from large collections of text, like search engines or AI assistants. If the dense retrievers miss fine details, the answers or results they provide can be wrong or misleading. By studying this granularity dilemma and improving how embeddings handle fine-grained details, the paper helps make retrieval systems more precise, especially in real-world tasks like image search or question answering.

Abstract

A new dataset named CapRetrieval is introduced to evaluate the ability of text encoders to recognize fine-grained entities and events, highlighting challenges in dense retrieval tasks.

View Paper