Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Synthesize high-dimensional longitudinal electronic health records via hierarchical autoregressive language model

2023·40 Zitationen·Nature CommunicationsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2023

Jahr

Abstract

Abstract Synthetic electronic health records (EHRs) that are both realistic and privacy-preserving offer alternatives to real EHRs for machine learning (ML) and statistical analysis. However, generating high-fidelity EHR data in its original, high-dimensional form poses challenges for existing methods. We propose Hierarchical Autoregressive Language mOdel () for generating longitudinal, high-dimensional EHR, which preserve the statistical properties of real EHRs and can train accurate ML models without privacy concerns. generates a probability density function over medical codes, clinical visits, and patient records, allowing for generating realistic EHR data without requiring variable selection or aggregation. Extensive experiments demonstrated that can generate high-fidelity data with high-dimensional disease code probabilities closely mirroring (above 0.9 R 2 correlation) real EHR data. also enhances the accuracy of predictive modeling and enables downstream ML models to attain similar accuracy as models trained on genuine data.

Autoren

Institutionen

Themen

Machine Learning in HealthcareTopic ModelingArtificial Intelligence in Healthcare

Volltext beim Verlag öffnen

Synthesize high-dimensional longitudinal electronic health records via hierarchical autoregressive language model

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen