OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 06.04.2026, 01:51

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

The effectiveness of feature attribution methods and its correlation\n with automatic evaluation scores

2021·5 Zitationen·arXiv (Cornell University)Open Access
Volltext beim Verlag öffnen

5

Zitationen

3

Autoren

2021

Jahr

Abstract

Explaining the decisions of an Artificial Intelligence (AI) model is\nincreasingly critical in many real-world, high-stake applications. Hundreds of\npapers have either proposed new feature attribution methods, discussed or\nharnessed these tools in their work. However, despite humans being the target\nend-users, most attribution methods were only evaluated on proxy\nautomatic-evaluation metrics (Zhang et al. 2018; Zhou et al. 2016; Petsiuk et\nal. 2018). In this paper, we conduct the first user study to measure\nattribution map effectiveness in assisting humans in ImageNet classification\nand Stanford Dogs fine-grained classification, and when an image is natural or\nadversarial (i.e., contains adversarial perturbations). Overall, feature\nattribution is surprisingly not more effective than showing humans nearest\ntraining-set examples. On a harder task of fine-grained dog categorization,\npresenting attribution maps to humans does not help, but instead hurts the\nperformance of human-AI teams compared to AI alone. Importantly, we found\nautomatic attribution-map evaluation measures to correlate poorly with the\nactual human-AI team performance. Our findings encourage the community to\nrigorously test their methods on the downstream human-in-the-loop applications\nand to rethink the existing evaluation metrics.\n

Ähnliche Arbeiten

Autoren

Themen

Adversarial Robustness in Machine LearningDomain Adaptation and Few-Shot LearningArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen