Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Explanations can be manipulated and geometry is to blame
145
Zitationen
6
Autoren
2019
Jahr
Abstract
Explanation methods aim to make neural networks more trustworthy and interpretable. In this paper, we demonstrate a property of explanation methods which is disconcerting for both of these purposes. Namely, we show that explanations can be manipulated arbitrarily by applying visually hardly perceptible perturbations to the input that keep the network's output approximately constant. We establish theoretically that this phenomenon can be related to certain geometrical properties of neural networks. This allows us to derive an upper bound on the susceptibility of explanations to manipulations. Based on this result, we propose effective mechanisms to enhance the robustness of explanations.
Ähnliche Arbeiten
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
2017 · 20.305 Zit.
Generative Adversarial Nets
2023 · 19.841 Zit.
Visualizing and Understanding Convolutional Networks
2014 · 15.236 Zit.
"Why Should I Trust You?"
2016 · 14.204 Zit.
On a Method to Measure Supervised Multiclass Model’s Interpretability: Application to Degradation Diagnosis (Short Paper)
2024 · 13.103 Zit.