Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Large Language Models in Generating Differential Diagnoses in the Emergency Department: A Comparative Study of ChatGPT, Copilot, and Emergency Physician
0
Zitationen
5
Autoren
2025
Jahr
Abstract
Aim: Accurate diagnosis in emergency departments relies heavily on clinical decision-making, yet cognitive errors contribute to a significant proportion of diagnostic mistakes.Since their launch, Generative Pre-trained Transformer-4 (GPT-4) based large language models (LLMs) have been reshaping healthcare, offering improvements in diagnostic accuracy, treatment planning, and patient care.This study evaluates the performance of these tools in generating primary and differential diagnoses compared to an experienced emergency medicine (EM) physician. Materials and Methods:We conducted a retrospective cross-sectional study using 468 real-world clinical vignettes from non-trauma adult patients.GPT-4-based Chat Generative Pre-trained Transformer (ChatGPT) and Copilot were tasked with generating five differential diagnoses for each vignette.Their accuracy was compared to the diagnoses provided by EM physicians, using discharge diagnoses as the reference.Statistical analysis included descriptive statistics and Cohen's kappa to assess agreement. Results:ChatGPT and Copilot demonstrated high accuracy, with correct diagnoses in the top three positions in 91.9% and 90.2% of cases, respectively, compared to 93.2% for the EM physician.Moderate agreement between the artificial intelligence (AI) tools and the EM physician was observed (kappa: 0.476 for ChatGPT and 0.414 for Copilot). Conclusion:LLM-based generative AI tools show promise as clinical decision support systems, enhancing diagnostic accuracy and assisting less-experienced clinicians.However, they should complement, not replace, human expertise in emergency settings.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.260 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.438 Zit.