Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Not the Models You Are Looking For: Traditional ML Outperforms LLMs in Clinical Prediction Tasks
5
Zitationen
10
Autoren
2024
Jahr
Abstract
Objectives: To determine the extent to which current Large Language Models (LLMs) can serve as substitutes for traditional machine learning (ML) as clinical predictors using data from electronic health records (EHRs), we investigated various factors that can impact their adoption, including overall performance, calibration, fairness, and resilience to privacy protections that reduce data fidelity. Materials and Methods: We evaluated GPT-3.5, GPT-4, and ML (as gradient-boosting trees) on clinical prediction tasks in EHR data from Vanderbilt University Medical Center and MIMIC IV. We measured predictive performance with AUROC and model calibration using Brier Score. To evaluate the impact of data privacy protections, we assessed AUROC when demographic variables are generalized. We evaluated algorithmic fairness using equalized odds and statistical parity across race, sex, and age of patients. We also considered the impact of using in-context learning by incorporating labeled examples within the prompt. Results: Traditional ML (AUROC: 0.847, 0.894 (VUMC, MIMIC)) substantially outperformed GPT-3.5 (AUROC: 0.537, 0.517) and GPT-4 (AUROC: 0.629, 0.602) (with and without in-context learning) in predictive performance and output probability calibration (Brier Score (ML vs GPT-3.5 vs GPT-4): 0.134 versus 0.384 versus 0.251, 0.042 versus 0.06 versus 0.219). Traditional ML is more robust than GPT-3.5 and GPT-4 to generalizing demographic information to protect privacy. GPT-4 is the fairest model according to our selected metrics but at the cost of poor model performance. Conclusion: These findings suggest that LLMs are much less effective and robust than locally-trained ML for clinical prediction tasks, but they are getting better over time.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.693 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.598 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.124 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.871 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.