Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Integration of genetic and clinical information to improve imputation of data missing from electronic health records
23
Zitationen
3
Autoren
2019
Jahr
Abstract
OBJECTIVE: Clinical data of patients' measurements and treatment history stored in electronic health record (EHR) systems are starting to be mined for better treatment options and disease associations. A primary challenge associated with utilizing EHR data is the considerable amount of missing data. Failure to address this issue can introduce significant bias in EHR-based research. Currently, imputation methods rely on correlations among the structured phenotype variables in the EHR. However, genetic studies have shown that many EHR-based phenotypes have a heritable component, suggesting that measured genetic variants might be useful for imputing missing data. In this article, we developed a computational model that incorporates patients' genetic information to perform EHR data imputation. MATERIALS AND METHODS: We used the individual single nucleotide polymorphism's association with phenotype variables in the EHR as input to construct a genetic risk score that quantifies the genetic contribution to the phenotype. Multiple approaches to constructing the genetic risk score were evaluated for optimal performance. The genetic score, along with phenotype correlation, is then used as a predictor to impute the missing values. RESULTS: To demonstrate the method performance, we applied our model to impute missing cardiovascular related measurements including low-density lipoprotein, heart failure, and aortic aneurysm disease in the electronic Medical Records and Genomics data. The integration method improved imputation's area-under-the-curve for binary phenotypes and decreased root-mean-square error for continuous phenotypes. CONCLUSION: Compared with standard imputation approaches, incorporating genetic information offers a novel approach that can utilize more of the EHR data for better performance in missing data imputation.
Ähnliche Arbeiten
PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses
2007 · 35.757 Zit.
WGCNA: an R package for weighted correlation network analysis
2008 · 28.625 Zit.
A global reference for human genetic variation
2015 · 19.703 Zit.
The variant call format and VCFtools
2011 · 17.439 Zit.
Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows
2010 · 16.476 Zit.