Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Abstract 4210: Transformer and pretraining on external ehr cohort boosts infection risk prediction in hematologic malignancies

2026·0 Zitationen·Cancer Research

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Abstract Infections are a major early driver of morbidity and mortality in hematologic malignancies, particularly in chronic lymphocytic leukemia (CLL), due to intrinsic immune dysfunction and therapy induced immunosuppression. Predicting infection risk and identifying contributors prior to treatment is warranted; meanwhile limited sample size in local EHRs limits our ability to predict them accurately. We demonstrate that an attention-based transformer pretrained on an external cohort (CLL), lymphomas, multiple myeloma (MM)) and finetuned on a local CLL cohort enhances infection risk prediction. We used multimodal genomic and clinical data (EHR, labs, treatment) from two independent datasets, Flatiron CLL Custom Spotlight (FCCS; n=1,725, USA) and DALYCARE (n=3,418; Denmark) that includes CLL, lymphomas, MM. In the absence of standard infection labels prescribed antibiotics served as a proxy; infection prevalence was 33.6% in FCCS and 64.4% in DALY CARE. After schema harmonization and temporal aggregation across four-time windows, we derived 389 features (FCCS) and 688 (DALYCARE), with 249 shared ones including treatment, demographics, labs, vitals, and omics info. Models were trained and evaluated to predict infection risk 24 week post first line treatment with 5fold cross-validation per cohort. Cross cohort generalization was assessed via self-supervised learning (SSL) pretraining on full DALYCARE followed by fine tuning on FCCS. On 249 shared features SSL pretraining on DALY-CARE increased C-index vs training only on FCCS, from 0.63±0.05 to 0.66±0.07, with consistent PR-AUC gain (0.42 ± 0.06 to 0.46 ± 0.1), indicating that knowledge transferred from a larger external cohort, even with mixed lymphoid subtypes, can mitigate performance loss when sample size is limited. Without DALYCARE pretraining and using the full FCCS feature set (389), the transformer reached a higher C-index (0.69 ± 0.02 vs 0.66 ± 0.07), suggesting richer cohort features- still outweigh pretraining benefits. The transformer also outperformed baselines: C-index 0.69 ± 0.02 (FCCS) and 0.68 ± 0.02 (DALYCARE) vs CoxPH 0.59 ± 0.03 and 0.57 ± 0.01; PR-AUC 0.73 ± 0.07 and 0.73 ± 0.04 vs LightGBM 0.60 ± 0.04 and 0.59 ± 0.06. Attention based interpretability scores, and permutation importance are aligned in both cohorts and with known risk factors, including del(17p), renal function markers (eGFR, K+, NA+), and prior infection history. Transformers transfer knowledge across external cohorts, outperform baselines in infection risk prediction, improved interpretability. While focused on CLL, this demonstrates that leveraging external cohorts and other diseases can improve local predictions when samples are limited. Future work will combine transferred prior knowledge with richer cohort features to further enhance performance. Refs: G. Argoty et al., Pretrained transformers in clinical studies, Nat Commun, 2025. Citation Format: Banafshe Felfeliyan, Natasha Markuzon. Transformer and pretraining on external ehr cohort boosts infection risk prediction in hematologic malignancies [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 4210.

Autoren

Institutionen

AstraZeneca (United States)(US)

Themen

Machine Learning in HealthcareArtificial Intelligence in Healthcare and EducationCardiovascular Health and Risk Factors

Volltext beim Verlag öffnen

Abstract 4210: Transformer and pretraining on external ehr cohort boosts infection risk prediction in hematologic malignancies

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen