Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models
Hengyi Wang, Shiwei Tan, Hao Wang
2024-06-20

Summary
This paper discusses a new method called Probabilistic Concept Explainers (PACE) designed to provide trustworthy explanations for the predictions made by Vision Transformers (ViTs). It aims to improve how we understand what these models are doing when they analyze images.
What's the problem?
As Vision Transformers have become popular for processing images, there has been a growing need for reliable ways to explain their predictions. Current methods, such as feature-attribution and conceptual models, often fail to provide clear and trustworthy explanations. This is a problem because without understanding how these models make decisions, users cannot fully trust their outputs, especially in critical applications like healthcare or autonomous driving.
What's the solution?
To address this issue, the authors propose five key criteria that any good explanation method should meet: faithfulness (the explanation should accurately reflect the model's reasoning), stability (the explanation should not change dramatically with small input changes), sparsity (the explanation should focus on the most important features), multi-level structure (it should provide explanations at different levels of detail), and parsimony (the explanation should be simple and concise). They introduce PACE, a variational Bayesian framework that models the distributions of image patches to create trustworthy explanations. This method provides insights into how different parts of an image contribute to the model's predictions and connects explanations at the image level with those at the dataset level.
Why it matters?
This research is important because it enhances our understanding of how Vision Transformers work, making them more transparent and trustworthy. By providing better explanations for their predictions, PACE can help users feel more confident in using these advanced AI systems in real-world applications. This could lead to broader adoption of AI technologies in sensitive areas where understanding model decisions is crucial.
Abstract
Vision transformers (ViTs) have emerged as a significant area of focus, particularly for their capacity to be jointly trained with large language models and to serve as robust vision foundation models. Yet, the development of trustworthy explanation methods for ViTs has lagged, particularly in the context of post-hoc interpretations of ViT predictions. Existing sub-image selection approaches, such as feature-attribution and conceptual models, fall short in this regard. This paper proposes five desiderata for explaining ViTs -- faithfulness, stability, sparsity, multi-level structure, and parsimony -- and demonstrates the inadequacy of current methods in meeting these criteria comprehensively. We introduce a variational Bayesian explanation framework, dubbed ProbAbilistic Concept Explainers (PACE), which models the distributions of patch embeddings to provide trustworthy post-hoc conceptual explanations. Our qualitative analysis reveals the distributions of patch-level concepts, elucidating the effectiveness of ViTs by modeling the joint distribution of patch embeddings and ViT's predictions. Moreover, these patch-level explanations bridge the gap between image-level and dataset-level explanations, thus completing the multi-level structure of PACE. Through extensive experiments on both synthetic and real-world datasets, we demonstrate that PACE surpasses state-of-the-art methods in terms of the defined desiderata.