Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Smoothed Geometry for Robust Attribution
12
Zitationen
6
Autoren
2020
Jahr
Abstract
Feature attributions are a popular tool for explaining the behavior of Deep Neural Networks (DNNs), but have recently been shown to be vulnerable to attacks that produce divergent explanations for nearby inputs. This lack of robustness is especially problematic in high-stakes applications where adversarially-manipulated explanations could impair safety and trustworthiness. Building on a geometric understanding of these attacks presented in recent work, we identify Lipschitz continuity conditions on models' gradient that lead to robust gradient-based attributions, and observe that smoothness may also be related to the ability of an attack to transfer across multiple attribution methods. To mitigate these attacks in practice, we propose an inexpensive regularization method that promotes these conditions in DNNs, as well as a stochastic smoothing technique that does not require re-training. Our experiments on a range of image models demonstrate that both of these mitigations consistently improve attribution robustness, and confirm the role that smooth geometry plays in these attacks on real, large-scale models.
Ähnliche Arbeiten
Rethinking the Inception Architecture for Computer Vision
2016 · 30.416 Zit.
MobileNetV2: Inverted Residuals and Linear Bottlenecks
2018 · 24.552 Zit.
CBAM: Convolutional Block Attention Module
2018 · 21.448 Zit.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2020 · 21.347 Zit.
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
2015 · 18.535 Zit.