Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Synthesize high-dimensional longitudinal electronic health records via hierarchical autoregressive language model
40
Zitationen
3
Autoren
2023
Jahr
Abstract
Abstract Synthetic electronic health records (EHRs) that are both realistic and privacy-preserving offer alternatives to real EHRs for machine learning (ML) and statistical analysis. However, generating high-fidelity EHR data in its original, high-dimensional form poses challenges for existing methods. We propose Hierarchical Autoregressive Language mOdel () for generating longitudinal, high-dimensional EHR, which preserve the statistical properties of real EHRs and can train accurate ML models without privacy concerns. generates a probability density function over medical codes, clinical visits, and patient records, allowing for generating realistic EHR data without requiring variable selection or aggregation. Extensive experiments demonstrated that can generate high-fidelity data with high-dimensional disease code probabilities closely mirroring (above 0.9 R 2 correlation) real EHR data. also enhances the accuracy of predictive modeling and enables downstream ML models to attain similar accuracy as models trained on genuine data.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.830 Zit.
Coding Algorithms for Defining Comorbidities in ICD-9-CM and ICD-10 Administrative Data
2005 · 10.565 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 9.010 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.649 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.202 Zit.