Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
You shouldn’t trust me: Learning models which conceal unfairness from multiple explanation methods.
54
Zitationen
4
Autoren
2020
Jahr
Abstract
Transparency of algorithmic systems is an important area of research, which has been discussed as a way for end-users and regulators to develop appropriate trust in machine learning models. One popular approach, LIME [23], even suggests that model expla- nations can answer the question “Why should I trust you?”. Here we show a straightforward method for modifying a pre-trained model to manipulate the output of many popular feature importance explana- tion methods with little change in accuracy, thus demonstrating the danger of trusting such explanation methods. We show how this ex- planation attack can mask a model’s discriminatory use of a sensitive feature, raising strong concerns about using such explanation meth- ods to check fairness of a model.
Ähnliche Arbeiten
Rethinking the Inception Architecture for Computer Vision
2016 · 30.316 Zit.
MobileNetV2: Inverted Residuals and Linear Bottlenecks
2018 · 24.385 Zit.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2020 · 21.292 Zit.
CBAM: Convolutional Block Attention Module
2018 · 21.257 Zit.
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
2015 · 18.488 Zit.