Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Knowing What You Cannot Explain: Learning to Reject Low-Quality Explanations
0
Zitationen
4
Autoren
2025
Jahr
Abstract
Learning to Reject (LtR) frameworks allow ML models to abstain from uncertain predictions and promote user trust. However, since current LtR strategies focus solely on predictive performance, they completely neglect explanation quality. Low-quality explanations -- whether they inaccurately reflect the model's reasoning or fail to satisfy users -- can severely compromise trust assessments and induce over-reliance on incorrect predictions. We argue that models should abstain from making a prediction when they cannot offer a satisfactory explanation for it and introduce a framework for learning to reject low-quality explanations (LtX) in which predictors are equipped with a rejector that evaluates the explanation quality. Focusing on popular attribution techniques, we propose REX (REjector of low-quality eXplanations), which learns a rejector from explanation quality labels combining machine-side judgments with explicit human annotations to assess explanation quality. Our empirical evaluation demonstrates that \method outperforms popular LtR strategies and baselines relying on isolated explanation metrics. Finally, to support future research, we publicly release a novel, larger-scale dataset of 1050 human-annotated machine explanations.
Ähnliche Arbeiten
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
2017 · 20.702 Zit.
Generative Adversarial Nets
2023 · 19.895 Zit.
Visualizing and Understanding Convolutional Networks
2014 · 15.323 Zit.
"Why Should I Trust You?"
2016 · 14.544 Zit.
On a Method to Measure Supervised Multiclass Model’s Interpretability: Application to Degradation Diagnosis (Short Paper)
2024 · 13.195 Zit.