Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A deep learning technique for imputing missing healthcare data
45
Zitationen
3
Autoren
2019
Jahr
Abstract
Missing data is a frequent occurrence in medical and health datasets. The analysis of datasets with missing data can lead to loss in statistical power or biased results. We address this issue with a novel deep learning technique to impute missing values in health data. Our method extends upon an autoencoder to derive a deep learning architecture that can learn the hidden representations of data even when data is perturbed by missing values (noise). Our model is constructed with overcomplete representation and trained with denoising regularization. This allows the latent/hidden layers of our model to effectively extract the relationships between different variables; these relationships are then used to reconstruct missing values. Our contributions include a new loss function designed to avoid local optima, and this helps the model to learn the real distribution of variables in the dataset. We evaluate our method in comparison with other well-established imputation strategies (mean, median imputation, SVD, KNN, matrix factorization and soft impute) on 48,350 Linked Birth/Infant Death Cohort Data records. Our experiments demonstrate that our method achieved lower imputation mean squared error (MSE=0.00988) compared with other imputation methods (with MSE ranging from 0.02 to 0.08). When assessing the imputation quality using the imputed data for prediction tasks, our experiments show that the data imputed by our method yielded better results (F1=70.37%) compared with other imputation methods (ranging from 66 to 69%).
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.879 Zit.
Coding Algorithms for Defining Comorbidities in ICD-9-CM and ICD-10 Administrative Data
2005 · 10.574 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 9.011 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.666 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.220 Zit.