OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 31.03.2026, 04:38

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Patch Shortcuts: Interpretable Proxy Models Efficiently Find Black-Box\n Vulnerabilities

2021·0 Zitationen·arXiv (Cornell University)Open Access
Volltext beim Verlag öffnen

0

Zitationen

5

Autoren

2021

Jahr

Abstract

An important pillar for safe machine learning (ML) is the systematic\nmitigation of weaknesses in neural networks to afford their deployment in\ncritical applications. An ubiquitous class of safety risks are learned\nshortcuts, i.e. spurious correlations a network exploits for its decisions that\nhave no semantic connection to the actual task. Networks relying on such\nshortcuts bear the risk of not generalizing well to unseen inputs.\nExplainability methods help to uncover such network vulnerabilities. However,\nmany of these techniques are not directly applicable if access to the network\nis constrained, in so-called black-box setups. These setups are prevalent when\nusing third-party ML components. To address this constraint, we present an\napproach to detect learned shortcuts using an interpretable-by-design network\nas a proxy to the black-box model of interest. Leveraging the proxy's\nguarantees on introspection we automatically extract candidates for learned\nshortcuts. Their transferability to the black box is validated in a systematic\nfashion. Concretely, as proxy model we choose a BagNet, which bases its\ndecisions purely on local image patches. We demonstrate on the autonomous\ndriving dataset A2D2 that extracted patch shortcuts significantly influence the\nblack box model. By efficiently identifying such patch-based vulnerabilities,\nwe contribute to safer ML models.\n

Ähnliche Arbeiten

Autoren

Themen

Adversarial Robustness in Machine LearningExplainable Artificial Intelligence (XAI)Artificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen