Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Real-World Performance of Large Language Models in Emergency Department Chest Pain Triage
5
Zitationen
18
Autoren
2024
Jahr
Abstract
Abstract Background Large Language Models (LLMs) are increasingly being explored for medical applications, particularly in emergency triage where rapid and accurate decision-making is crucial. This study evaluates the diagnostic performance of two prominent Chinese LLMs, “Tongyi Qianwen” and “Lingyi Zhihui,” alongside a newly developed model, MediGuide-14B, comparing their effectiveness with human medical experts in emergency chest pain triage. Methods Conducted at Peking University Third Hospital’s emergency centers from June 2021 to May 2023, this retrospective study involved 11,428 patients with chest pain symptoms. Data were extracted from electronic medical records, excluding diagnostic test results, and used to assess the models and human experts in a double-blind setup. The models’ performances were evaluated based on their accuracy, sensitivity, and specificity in diagnosing Acute Coronary Syndrome (ACS). Findings “Lingyi Zhihui” demonstrated a diagnostic accuracy of 76.40%, sensitivity of 90.99%, and specificity of 70.15%. “Tongyi Qianwen” showed an accuracy of 61.11%, sensitivity of 91.67%, and specificity of 47.95%. MediGuide-14B outperformed these models with an accuracy of 84.52%, showcasing high sensitivity and commendable specificity. Human experts achieved higher accuracy (86.37%) and specificity (89.26%) but lower sensitivity compared to the LLMs. The study also highlighted the potential of LLMs to provide rapid triage decisions, significantly faster than human experts, though with varying degrees of reliability and completeness in their recommendations. Interpretation The study confirms the potential of LLMs in enhancing emergency medical diagnostics, particularly in settings with limited resources. MediGuide-14B, with its tailored training for medical applications, demonstrates considerable promise for clinical integration. However, the variability in performance underscores the need for further fine-tuning and contextual adaptation to improve reliability and efficacy in medical applications. Future research should focus on optimizing LLMs for specific medical tasks and integrating them with conventional medical systems to leverage their full potential in real-world settings.
Ähnliche Arbeiten
A new method of classifying prognostic comorbidity in longitudinal studies: Development and validation
1987 · 49.064 Zit.
APACHE II
1985 · 13.496 Zit.
Cochrane Database of Systematic Reviews
2003 · 11.282 Zit.
Epidemiology of severe sepsis in the United States: Analysis of incidence, outcome, and associated costs of care
2001 · 8.569 Zit.
The injury severity score: a method for describing patients with multiple injuries and evaluating emergency care.
1974 · 8.023 Zit.
Autoren
Institutionen
- King University(US)
- Peking University Third Hospital(CN)
- Peking University(CN)
- Tianjin University(CN)
- Chinese Academy of Medical Sciences & Peking Union Medical College(CN)
- First Affiliated Hospital of Hebei Medical University(CN)
- Peking University First Hospital(CN)
- Peking University People's Hospital(CN)