Rethinking Saliency Maps: A Cognitive Human Aligned Taxonomy and Evaluation Framework for Explanations
Yehonatan Elisha, Seffi Cohen, Oren Barkan, Noam Koenigstein
2025-11-24
Summary
This paper is about how we understand *why* AI makes the decisions it does, specifically when looking at images. It points out that current methods for explaining AI decisions, called saliency maps, aren't always helpful because they don't consider what the user actually wants to know.
What's the problem?
Currently, there's a lot of confusion about what saliency maps are *supposed* to do. Do they explain why an AI chose one thing over everything else, or why it chose one thing over a specific alternative? Also, do they explain things at a detailed level (like, 'why this specific breed of dog?') or a broader level (like, 'why a dog at all?'). Because of this lack of clarity, it's hard to tell if these explanations are actually good or if they're just showing random parts of the image.
What's the solution?
The researchers created a new way to categorize explanations called 'RFxG'. This framework looks at explanations from two angles: the 'reference frame' – whether the explanation is about a single prediction or a comparison to other options – and the 'granularity' – how detailed the explanation is. They then developed new ways to test how well saliency maps perform based on this RFxG framework, evaluating ten different explanation methods on various images and AI models.
Why it matters?
This work is important because it pushes the field of AI explainability to focus on what *users* need to know, not just what's easy to measure. By providing a better way to evaluate explanations, it will help developers create AI systems that are more transparent and trustworthy, ultimately making AI more useful and understandable for everyone.
Abstract
Saliency maps are widely used for visual explanations in deep learning, but a fundamental lack of consensus persists regarding their intended purpose and alignment with diverse user queries. This ambiguity hinders the effective evaluation and practical utility of explanation methods. We address this gap by introducing the Reference-Frame times Granularity (RFxG) taxonomy, a principled conceptual framework that organizes saliency explanations along two essential axes:Reference-Frame: Distinguishing between pointwise ("Why this prediction?") and contrastive ("Why this and not an alternative?") explanations. Granularity: Ranging from fine-grained class-level (e.g., "Why Husky?") to coarse-grained group-level (e.g., "Why Dog?") interpretations. Using the RFxG lens, we demonstrate critical limitations in existing evaluation metrics, which overwhelmingly prioritize pointwise faithfulness while neglecting contrastive reasoning and semantic granularity. To systematically assess explanation quality across both RFxG dimensions, we propose four novel faithfulness metrics. Our comprehensive evaluation framework applies these metrics to ten state-of-the-art saliency methods, four model architectures, and three datasets. By advocating a shift toward user-intent-driven evaluation, our work provides both the conceptual foundation and the practical tools necessary to develop visual explanations that are not only faithful to the underlying model behavior but are also meaningfully aligned with the complexity of human understanding and inquiry.