Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A Trustworthy Counterfactual Explanation Method With Latent Space Smoothing
2
Zitationen
5
Autoren
2024
Jahr
Abstract
Despite the large-scale adoption of Artificial Intelligence (AI) models in healthcare, there is an urgent need for trustworthy tools to rigorously backtrack the model decisions so that they behave reliably. Counterfactual explanations take a counter-intuitive approach to allow users to explore "what if" scenarios gradually becoming popular in the trustworthy field. However, most previous work on model's counterfactual explanation cannot generate in-distribution attribution credibly, produces adversarial examples, or fails to give a confidence interval for the explanation. Hence, in this paper, we propose a novel approach that generates counterfactuals in locally smooth directed semantic embedding space, and at the same time gives an uncertainty estimate in the counterfactual generation process. Specifically, we identify low-dimensional directed semantic embedding space based on Principal Component Analysis (PCA) applied in differential generative model. Then, we propose latent space smoothing regularization to rectify counterfactual search within in-distribution, such that visually-imperceptible changes are more robust to adversarial perturbations. Moreover, we put forth an uncertainty estimation framework for evaluating counterfactual uncertainty. Extensive experiments on several challenging realistic Chest X-ray and CelebA datasets show that our approach performs consistently well and better than the existing several state-of-the-art baseline approaches.
Ähnliche Arbeiten
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
2017 · 20.311 Zit.
Generative Adversarial Nets
2023 · 19.841 Zit.
Visualizing and Understanding Convolutional Networks
2014 · 15.238 Zit.
"Why Should I Trust You?"
2016 · 14.210 Zit.
On a Method to Measure Supervised Multiclass Model’s Interpretability: Application to Degradation Diagnosis (Short Paper)
2024 · 13.104 Zit.