Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating Classifiers Trained on Differentially Private Synthetic Health Data

2023·4 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2023

Jahr

Abstract

The release of differentially private (DP) synthetic data has been proposed as a solution to sharing sensitive individual-level medical data for statistical analysis and machine learning model development. The approach holds promise to generate realistic data that preserves many of the statistical properties of the original data while giving privacy guarantees that bound the risk of leaking any sensitive information about the individuals in the data. However, evaluating the generalization of machine learning models trained on DP-synthetic data remains an open question. A model selected based on its accuracy on synthetic data does not necessarily generalize well to real-world data, leading to poor results and incorrect insights. In this study, we experimentally compare two different protocols for model evaluation and hyperparameter selection for classifiers trained on DP-synthetic medical data. In the first protocol, we use only synthetic data for model selection and final evaluation of selected model, whereas in the second one, we assume limited DP access to a private real validation and test set held by the data curator. Our results provide novel insights into the practical feasibility and utility of different evaluation protocols for classifiers trained on DP synthetic data based on a comprehensive empirical study.

Autoren

Institutionen

University of Turku(FI)

Themen

Privacy-Preserving Technologies in DataEthics in Clinical ResearchArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Evaluating Classifiers Trained on Differentially Private Synthetic Health Data

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen