OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 28.03.2026, 16:57

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Machine Learning Accuracy in Healthcare Risk Prediction: Algorithms, Datasets, and Effect Sizes: A Meta-Analysis

2021·0 ZitationenOpen Access
Volltext beim Verlag öffnen

0

Zitationen

2

Autoren

2021

Jahr

Abstract

This study addressed the problem that machine learning healthcare risk prediction research reports “accuracy” inconsistently across algorithms, datasets, and validation designs, making it difficult for clinicians and health system leaders to identify dependable models for real-world deployment. Using a quantitative, cross-sectional, case-based design, each eligible peer-reviewed paper was treated as a “case” drawn from large-scale clinical data environments, including enterprise electronic health record implementations and multi-institution critical-care repositories. The purpose was to quantify comparative performance across model families, document dataset and reporting patterns, and estimate effect-size style differences under heterogeneous conditions. The final sample included N = 110 studies spanning 14 outcome categories, dominated by EHR/ICU datasets (62.7%, n = 69), followed by claims/administrative (14.5%, n = 16), registry/cohort (11.8%, n = 13), and imaging or multimodal (10.9%, n = 12); 88 studies provided extractable AUROC and/or AUPRC for quantitative synthesis. Key variables included algorithm family (regularized linear, ensemble, deep learning), dataset modality, validation type (internal-only vs external), outcome category and imbalance prevalence, and reporting indicators (calibration, missingness and imbalance handling). The analysis plan applied random-effects pooling with subgroup and moderator comparisons to explain heterogeneity. Headline findings showed an overall pooled AUROC = 0.83 (95% CI: 0.81–0.85; I² = 78%), with ensemble models AUROC = 0.85 (0.83–0.87) and deep learning AUROC = 0.86 (0.84–0.88) outperforming regularized linear baselines AUROC = 0.79 (0.77–0.81), yielding ΔAUROC = +0.06 for ensembles versus linear models; external validation reduced pooled AUROC to 0.80 (0.78–0.82) compared with 0.85 (0.83–0.87) for internal-only studies (ΔAUROC = −0.05). Where reported, minority-class performance differed meaningfully under imbalance, with pooled AUPRC 0.42 for deep learning vs 0.36 for ensembles, and calibration evidence was limited but centered near reliability when reported (median calibration slope = 0.94; IQR 0.86–1.03). These results imply that model choice should be guided not only by discrimination but also by validation breadth and prevalence-sensitive metrics, and that stronger standards for external validation and calibration reporting are necessary for safer enterprise adoption of risk prediction tools.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMachine Learning in HealthcareSepsis Diagnosis and Treatment
Volltext beim Verlag öffnen