Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Patch Shortcuts: Interpretable Proxy Models Efficiently Find Black-Box\n Vulnerabilities
0
Zitationen
5
Autoren
2021
Jahr
Abstract
An important pillar for safe machine learning (ML) is the systematic\nmitigation of weaknesses in neural networks to afford their deployment in\ncritical applications. An ubiquitous class of safety risks are learned\nshortcuts, i.e. spurious correlations a network exploits for its decisions that\nhave no semantic connection to the actual task. Networks relying on such\nshortcuts bear the risk of not generalizing well to unseen inputs.\nExplainability methods help to uncover such network vulnerabilities. However,\nmany of these techniques are not directly applicable if access to the network\nis constrained, in so-called black-box setups. These setups are prevalent when\nusing third-party ML components. To address this constraint, we present an\napproach to detect learned shortcuts using an interpretable-by-design network\nas a proxy to the black-box model of interest. Leveraging the proxy's\nguarantees on introspection we automatically extract candidates for learned\nshortcuts. Their transferability to the black box is validated in a systematic\nfashion. Concretely, as proxy model we choose a BagNet, which bases its\ndecisions purely on local image patches. We demonstrate on the autonomous\ndriving dataset A2D2 that extracted patch shortcuts significantly influence the\nblack box model. By efficiently identifying such patch-based vulnerabilities,\nwe contribute to safer ML models.\n
Ähnliche Arbeiten
Rethinking the Inception Architecture for Computer Vision
2016 · 30.404 Zit.
MobileNetV2: Inverted Residuals and Linear Bottlenecks
2018 · 24.528 Zit.
CBAM: Convolutional Block Attention Module
2018 · 21.428 Zit.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2020 · 21.341 Zit.
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
2015 · 18.530 Zit.