Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Regularizing Black-box Models for Improved Interpretability
37
Zitationen
6
Autoren
2019
Jahr
Abstract
Most of the work on interpretable machine learning has focused on designing either inherently interpretable models, which typically trade-off accuracy for interpretability, or post-hoc explanation systems, whose explanation quality can be unpredictable. Our method, ExpO, is a hybridization of these approaches that regularizes a model for explanation quality at training time. Importantly, these regularizers are differentiable, model agnostic, and require no domain knowledge to define. We demonstrate that post-hoc explanations for ExpO-regularized models have better explanation quality, as measured by the common fidelity and stability metrics. We verify that improving these metrics leads to significantly more useful explanations with a user study on a realistic task.
Ähnliche Arbeiten
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
2017 · 21.065 Zit.
Generative Adversarial Nets
2023 · 19.896 Zit.
Visualizing and Understanding Convolutional Networks
2014 · 15.382 Zit.
"Why Should I Trust You?"
2016 · 14.801 Zit.
Generative adversarial networks
2020 · 13.384 Zit.