Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating Classifiers Trained on Differentially Private Synthetic Health Data
4
Zitationen
5
Autoren
2023
Jahr
Abstract
The release of differentially private (DP) synthetic data has been proposed as a solution to sharing sensitive individual-level medical data for statistical analysis and machine learning model development. The approach holds promise to generate realistic data that preserves many of the statistical properties of the original data while giving privacy guarantees that bound the risk of leaking any sensitive information about the individuals in the data. However, evaluating the generalization of machine learning models trained on DP-synthetic data remains an open question. A model selected based on its accuracy on synthetic data does not necessarily generalize well to real-world data, leading to poor results and incorrect insights. In this study, we experimentally compare two different protocols for model evaluation and hyperparameter selection for classifiers trained on DP-synthetic medical data. In the first protocol, we use only synthetic data for model selection and final evaluation of selected model, whereas in the second one, we assume limited DP access to a private real validation and test set held by the data curator. Our results provide novel insights into the practical feasibility and utility of different evaluation protocols for classifiers trained on DP synthetic data based on a comprehensive empirical study.
Ähnliche Arbeiten
k-ANONYMITY: A MODEL FOR PROTECTING PRIVACY
2002 · 8.404 Zit.
Calibrating Noise to Sensitivity in Private Data Analysis
2006 · 6.901 Zit.
Deep Learning with Differential Privacy
2016 · 5.634 Zit.
Federated Machine Learning
2019 · 5.604 Zit.
Communication-Efficient Learning of Deep Networks from Decentralized\n Data
2016 · 5.595 Zit.