Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

You shouldn’t trust me: Learning models which conceal unfairness from multiple explanation methods.

2020·54 Zitationen·Apollo (University of Cambridge)Open Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2020

Jahr

Abstract

Transparency of algorithmic systems is an important area of research, which has been discussed as a way for end-users and regulators to develop appropriate trust in machine learning models. One popular approach, LIME [23], even suggests that model expla- nations can answer the question “Why should I trust you?”. Here we show a straightforward method for modifying a pre-trained model to manipulate the output of many popular feature importance explana- tion methods with little change in accuracy, thus demonstrating the danger of trusting such explanation methods. We show how this ex- planation attack can mask a model’s discriminatory use of a sensitive feature, raising strong concerns about using such explanation meth- ods to check fairness of a model.

Autoren

Institutionen

Themen

Adversarial Robustness in Machine LearningExplainable Artificial Intelligence (XAI)Artificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

You shouldn’t trust me: Learning models which conceal unfairness from multiple explanation methods.

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen