Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Representation of EHR data for predictive modeling: a comparison between UMLS and other terminologies

2020·30 Zitationen·Journal of the American Medical Informatics AssociationOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2020

Jahr

Abstract

OBJECTIVE: Predictive disease modeling using electronic health record data is a growing field. Although clinical data in their raw form can be used directly for predictive modeling, it is a common practice to map data to standard terminologies to facilitate data aggregation and reuse. There is, however, a lack of systematic investigation of how different representations could affect the performance of predictive models, especially in the context of machine learning and deep learning. MATERIALS AND METHODS: We projected the input diagnoses data in the Cerner HealthFacts database to Unified Medical Language System (UMLS) and 5 other terminologies, including CCS, CCSR, ICD-9, ICD-10, and PheWAS, and evaluated the prediction performances of these terminologies on 2 different tasks: the risk prediction of heart failure in diabetes patients and the risk prediction of pancreatic cancer. Two popular models were evaluated: logistic regression and a recurrent neural network. RESULTS: For logistic regression, using UMLS delivered the optimal area under the receiver operating characteristics (AUROC) results in both dengue hemorrhagic fever (81.15%) and pancreatic cancer (80.53%) tasks. For recurrent neural network, UMLS worked best for pancreatic cancer prediction (AUROC 82.24%), second only (AUROC 85.55%) to PheWAS (AUROC 85.87%) for dengue hemorrhagic fever prediction. DISCUSSION/CONCLUSION: In our experiments, terminologies with larger vocabularies and finer-grained representations were associated with better prediction performances. In particular, UMLS is consistently 1 of the best-performing ones. We believe that our work may help to inform better designs of predictive models, although further investigation is warranted.

Autoren

Institutionen

The University of Texas Health Science Center at Houston(US)

Themen

Machine Learning in HealthcareArtificial Intelligence in HealthcareMedical Coding and Health Information

Volltext beim Verlag öffnen

Representation of EHR data for predictive modeling: a comparison between UMLS and other terminologies

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen