Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Detecting and Remediating Harmful Data Shifts for the Responsible Deployment of Clinical AI Models
28
Zitationen
11
Autoren
2025
Jahr
Abstract
Importance: Clinical artificial intelligence (AI) systems are susceptible to performance degradation due to data shifts, which can lead to erroneous predictions and potential patient harm. Proactively detecting and mitigating these shifts is crucial for maintaining AI effectiveness and safety in clinical practice. Objectives: To develop and evaluate a proactive, label-agnostic monitoring pipeline to detect and mitigate harmful data shifts in clinical AI systems and to assess the use of transfer learning and continual learning strategies in maintaining model performance. Design, Setting, and Participants: This prognostic study was conducted using electronic health record data for admissions to general internal medicine wards of 7 large hospitals (5 academic and 2 community) in Toronto, Canada, between January 1, 2010, to August 31, 2020. Inpatients (aged ≥18 years) with a hospital stay of at least 24 hours were included. Data analysis was performed from January to August 2022. Exposures: Data shifts due to changes in hospital type, critical laboratory assays, patient demographics, admission type, and the COVID-19 pandemic. Main Outcomes and Measures: The primary outcome was predictive performance for all-cause in-hospital mortality within the next 2 weeks, evaluated using the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). Data shifts were detected using a label-agnostic monitoring pipeline employing a black box shift estimator with maximum mean discrepancy testing. Results: Data were available for 143 049 adult inpatients (mean [SD] age, 67.8 [19.6] years; 50.7% female). Significant data shifts were detected as a result of changes in younger age groups and admissions from nursing homes and acute care centers, transferring from community to academic hospitals, and changes in brain natriuretic peptide and D-dimer. Transfer learning improved model performance of community hospitals in a hospital type-dependent manner (Delta AUROC [SD], 0.05 [0.03]; Delta AUPRC [SD], 0.06 [0.04]). During the COVID-19 pandemic, drift-triggered continual learning improved overall model performance (Delta AUROC [SD], 0.44 [0.02]; P = .007, Mann-Whitney U test). Conclusions and Relevance: In this prognostic study, a proactive, label-agnostic monitoring pipeline detected harmful data shifts for a clinical AI system predicting in-hospital mortality. Transfer learning and drift-triggered continual learning strategies mitigated performance degradation, maintaining model performance across health care settings. These findings suggest that the approach used here may ensure the robust and equitable deployment of clinical AI models. Future research should explore the generalizability of this framework across diverse clinical domains, data modalities, and longer deployment periods to further validate its effectiveness.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.674 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.583 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.105 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.862 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.