Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Diagnostic Accuracy of a Large Language Model (ChatGPT-4) for Patients Admitted to a Community Hospital Medical Intensive Care Unit: A Retrospective Case Study
1
Zitationen
9
Autoren
2025
Jahr
Abstract
BackgroundThe future of artificial intelligence in medicine includes the use of machine learning and large language models to improve diagnostic accuracy, as a point-of-care tool, at the time of admission to an acute care hospital. The large language model, ChatGPT-4, has been shown to diagnose complex medical conditions with accuracies comparable to experienced clinicians, however, most published studies involved curated cases or examination-like questions and are not point-of-care. To test the hypothesis that ChatGPT-4 can make an accurate medical diagnosis using real-world medical cases and a convenient cut and paste strategy, we performed a retrospective case study involving critically ill patients admitted to a community hospital medical intensive care unit.MethodsA redacted H&P was essentially cut and pasted into ChatGPT-4 with uniform instructions to make a leading diagnosis and a list of 5 possibilities as a differential diagnosis. All features that could be used to identify patients were removed to ensure privacy and HIPAA compliance. The ChatGPT-4 diagnoses were compared with critical care physician diagnoses using a blinded longitudinal chart review as the ground truth diagnosis.ResultsA total of 120 randomly selected cases were included in the study. The diagnostic accuracy was 88.3% for physicians and 85.0% for ChatGPT-4, with no significant difference by McNemar testing (p-value of 0.249). The agreement between physician diagnosis and ChatGPT-4 diagnosis was moderate, 0.57 (95% CI: 0.35-0.79), based on Cohen's kappa statistic.ConclusionThese results suggest that ChatGTP-4 achieved diagnostic accuracy comparable to board certified physicians in the context of critically ill patients admitted to a community medical intensive care unit. Furthermore, the agreement was only moderate, suggesting that there may be complementary ways of combining the diagnostic acumen of physicians and ChatGPT-4 to improve overall accuracy. A prospective study would be necessary to determine if ChatGPT-4 could improve patient outcomes as a point-of-care tool at the time of admission.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.245 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.102 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.468 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.429 Zit.