Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Signal Fidelity Index-aware calibration for addressing distributional shift in predictive modeling across heterogeneous real-world data
2
Zitationen
6
Autoren
2025
Jahr
Abstract
Machine learning models trained on real-world data (RWD) often experience performance degradation when deployed across different settings due to distributional shift. However, a fundamental but under-explored factor contributing to this degradation is the decay of diagnostic signals: systematic variability in diagnostic quality and consistency across institutional contexts, which affects the reliability of clinical codes used for model training and prediction. To develop and evaluate a Signal Fidelity Index (SFI) that quantifies diagnostic signal decay at the patient level across diverse clinical conditions, and to assess the effectiveness of SFI-aware calibration in improving model performance compared to established calibration methods, without requiring outcome labels in target domains after initial method development. We developed a comprehensive simulation framework using synthetic patient datasets across six clinically diverse phenotypes: dementia, geriatric bipolar disorder, fibromyalgia, adult ADHD, type 2 diabetes, and hypertension. Each phenotype included independent simulation batches with varying demographic compositions and data quality characteristics. The SFI was constructed from six components: diagnostic specificity, temporal consistency, entropy, contextual concordance, medication alignment, and trajectory stability. We implemented SFI-aware calibration using a multiplicative adjustment formula with phenotype-specific calibration parameters optimized through supervised parameter development, then evaluated performance in label-free deployment across heterogeneous testing datasets. We compared SFI-aware calibration against established baseline calibration methods. SFI-aware calibration significantly improved predictive performance against both uncalibrated predictions and all baseline methods across nearly all six phenotypes (Cohen's d = 0.603-5.002, [Formula: see text]). Performance improvements varied by phenotype complexity, with F1-score gains ranging from +4.2% for fibromyalgia to +34.9% for dementia, and AUC gains ranging from +4.7% to +40.1%. Traditional calibration methods often led to degraded performance, with isotonic regression exhibiting universal failure (AUC values degraded to 0.51-0.56 across all phenotypes) and Platt scaling demonstrating inconsistent, phenotype-dependent effects. Brier score decomposition revealed that SFI-aware calibration improved performance through a dual mechanism: reliability reductions ranging from 11% to 29% and resolution increases ranging from +35% to +238%. Notably, even well-diagnosed conditions with standardized diagnostic criteria (type 2 diabetes, hypertension) showed substantial benefits (+8.0% to +13.3% F1-score, +25.1% to +40.1% AUC), suggesting that diagnostic signal variability affects all EHR-based phenotyping. These findings demonstrate that diagnostic signal decay is a tractable problem that can be systematically addressed through patient-level fidelity-aware calibration strategies. SFI-aware calibration offers a practical approach for enhancing the performance of clinical prediction models across diverse healthcare contexts, requiring supervised parameter development once per phenotype, followed by label-free deployment to unlimited target populations. The method consistently outperforms established calibration techniques while avoiding their tendency to degrade performance, making it particularly suitable for large-scale administrative datasets where outcome labels are unavailable. Multi-phenotype validation confirms generalizability across clinical conditions ranging from complex, under-diagnosed phenotypes to well-diagnosed conditions with standardized criteria.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.307 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.679 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.207 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.607 Zit.
Artificial intelligence in healthcare: past, present and future
2017 · 4.411 Zit.