Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A new analytical framework for missing data imputation and classification with uncertainty: Missing data imputation and heart failure readmission prediction

2020·31 Zitationen·PLoS ONEOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2020

Jahr

Abstract

BACKGROUND: The wide adoption of electronic health records (EHR) system has provided vast opportunities to advance health care services. However, the prevalence of missing values in EHR system poses a great challenge on data analysis to support clinical decision-making. The objective of this study is to develop a new methodological framework that can address the missing data challenge and provide a reliable tool to predict the hospital readmission among Heart Failure patients. METHODS: We used Gaussian Process Latent Variable Model (GPLVM) to impute the missing values. Specifically, a lower dimensional embedding was learned from a small complete dataset and then used to impute the missing values in the incomplete dataset. The GPLVM-based missing data imputation can provide both the mean estimate and the uncertainty associated with the mean estimate. To incorporate the uncertainty in prediction, a constrained support vector machine (cSVM) was developed to obtain robust predictions. We first sampled multiple datasets from the distributions of input uncertainty and trained a support vector machine for each dataset. Then an optimal classifier was identified by selecting the support vectors that maximize the separation margin of a newly sampled dataset and minimize the similarity with the pre-trained support vectors. RESULTS: The proposed model was derived and validated using Physionet MIMIC-III clinical database. The GPLVM imputation provided normalized mean absolute errors of 0.11 and 0.12 respectively when 20% and 30% of instances contained missing values, and the confidence bounds of the estimations captures 97% of the true values. The cSVM model provided an average Area Under Curve of 0.68, which improves the prediction accuracy by 7% as compared to some existing classifiers. CONCLUSIONS: The proposed method provides accurate imputation of missing values and has a better prediction performance as compared to existing models that can only deal with deterministic inputs.

Autoren

Institutionen

Texas Tech University(US)

Themen

Machine Learning in HealthcareHeart Failure Treatment and ManagementSepsis Diagnosis and Treatment

Volltext beim Verlag öffnen

A new analytical framework for missing data imputation and classification with uncertainty: Missing data imputation and heart failure readmission prediction

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen