Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Abstract 4210: Transformer and pretraining on external ehr cohort boosts infection risk prediction in hematologic malignancies
0
Zitationen
2
Autoren
2026
Jahr
Abstract
Abstract Infections are a major early driver of morbidity and mortality in hematologic malignancies, particularly in chronic lymphocytic leukemia (CLL), due to intrinsic immune dysfunction and therapy induced immunosuppression. Predicting infection risk and identifying contributors prior to treatment is warranted; meanwhile limited sample size in local EHRs limits our ability to predict them accurately. We demonstrate that an attention-based transformer pretrained on an external cohort (CLL), lymphomas, multiple myeloma (MM)) and finetuned on a local CLL cohort enhances infection risk prediction. We used multimodal genomic and clinical data (EHR, labs, treatment) from two independent datasets, Flatiron CLL Custom Spotlight (FCCS; n=1,725, USA) and DALYCARE (n=3,418; Denmark) that includes CLL, lymphomas, MM. In the absence of standard infection labels prescribed antibiotics served as a proxy; infection prevalence was 33.6% in FCCS and 64.4% in DALY CARE. After schema harmonization and temporal aggregation across four-time windows, we derived 389 features (FCCS) and 688 (DALYCARE), with 249 shared ones including treatment, demographics, labs, vitals, and omics info. Models were trained and evaluated to predict infection risk 24 week post first line treatment with 5fold cross-validation per cohort. Cross cohort generalization was assessed via self-supervised learning (SSL) pretraining on full DALYCARE followed by fine tuning on FCCS. On 249 shared features SSL pretraining on DALY-CARE increased C-index vs training only on FCCS, from 0.63±0.05 to 0.66±0.07, with consistent PR-AUC gain (0.42 ± 0.06 to 0.46 ± 0.1), indicating that knowledge transferred from a larger external cohort, even with mixed lymphoid subtypes, can mitigate performance loss when sample size is limited. Without DALYCARE pretraining and using the full FCCS feature set (389), the transformer reached a higher C-index (0.69 ± 0.02 vs 0.66 ± 0.07), suggesting richer cohort features- still outweigh pretraining benefits. The transformer also outperformed baselines: C-index 0.69 ± 0.02 (FCCS) and 0.68 ± 0.02 (DALYCARE) vs CoxPH 0.59 ± 0.03 and 0.57 ± 0.01; PR-AUC 0.73 ± 0.07 and 0.73 ± 0.04 vs LightGBM 0.60 ± 0.04 and 0.59 ± 0.06. Attention based interpretability scores, and permutation importance are aligned in both cohorts and with known risk factors, including del(17p), renal function markers (eGFR, K+, NA+), and prior infection history. Transformers transfer knowledge across external cohorts, outperform baselines in infection risk prediction, improved interpretability. While focused on CLL, this demonstrates that leveraging external cohorts and other diseases can improve local predictions when samples are limited. Future work will combine transferred prior knowledge with richer cohort features to further enhance performance. Refs: G. Argoty et al., Pretrained transformers in clinical studies, Nat Commun, 2025. Citation Format: Banafshe Felfeliyan, Natasha Markuzon. Transformer and pretraining on external ehr cohort boosts infection risk prediction in hematologic malignancies [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 4210.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.522 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.813 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.376 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.832 Zit.
Artificial intelligence in healthcare: past, present and future
2017 · 4.470 Zit.