Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Towards explainable model extraction attacks

2022·8 Zitationen·International Journal of Intelligent Systems

Volltext beim Verlag öffnen

Zitationen

Autoren

2022

Jahr

Abstract

One key factor able to boost the applications of artificial intelligence (AI) in security-sensitive domains is to leverage them responsibly, which is engaged in providing explanations for AI. To date, a plethora of explainable artificial intelligence (XAI) has been proposed to help users interpret model decisions. However, given its data-driven nature, the explanation itself is potentially susceptible to a high risk of exposing privacy. In this paper, we first show that the existing XAI is vulnerable to model extraction attacks and then present an XAI-aware dual-task model extraction attack (DTMEA). DTMEA can attack a target model with explanation services, that is, it can extract both the classification and explanation tasks of the target model. More specifically, the substitution model extracted by DTMEA is a multitask learning architecture, consisting of a sharing layer and two task-specific layers for classification and explanation. To reveal which explanation technologies are more vulnerable to expose privacy information, we conduct an empirical evaluation of four major explanation types in the benchmark data set. Experimental results show that the attack accuracy of DTMEA outperforms the predicted-only method with up to 1.25%, 1.53%, 9.25%, and 7.45% in MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100, respectively. By exposing the potential threats on explanation technologies, our research offers the insights to develop effective tools that are able to trade off security-sensitive relationships.

Autoren

Institutionen

Themen

Adversarial Robustness in Machine LearningExplainable Artificial Intelligence (XAI)Artificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Towards explainable model extraction attacks

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen