Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
AI in the Hot Seat: Head-to-Head Comparison of Large Language Models and Cardiologists in Emergency Scenarios
0
Zitationen
10
Autoren
2026
Jahr
Abstract
<b>Background:</b> The clinical applicability of large language models (LLMs) in high-stakes cardiac emergencies remains unexplored. This study evaluated how well advanced LLMs perform in managing complex catheterization laboratory (Cath lab) scenarios and compared their performance with that of interventional cardiologists. <b>Methods and Results:</b> A cross-sectional study was conducted from 20 June to 2 December 2024. Twelve challenging inferior myocardial infarction scenarios were presented to seven LLMs (ChatGPT, Gemini, LLAMA, Qwen, Bing, Claude, DeepSeek) and five early-career interventional cardiologists. Responses were standardized, anonymized, and evaluated by thirty experienced interventional cardiologists. Performance comparisons were analyzed using a linear mixed-effects model with correlation and reliability statistics. Physicians had an average reference score of 80.68 (95% CI 76.3-85.0). Among LLMs, ChatGPT ranked highest (87.4, 95% CI 82.5-92.3), followed by Claude (80.8, 95% CI 75.7-85.9) and DeepSeek (78.7, 95% CI 72.9-84.6). LLAMA (73.7), Qwen (66.2), and Bing (64.3) ranked lower, while Gemini scored the lowest (59.0). ChatGPT scored higher than the early-career physician comparator group (difference 6.69, 95% CI 0.00-13.37; <i>p</i> < 0.05), whereas Gemini, LLAMA, Qwen, and Bing performed significantly worse; Claude and DeepSeek showed no significant difference. <b>Conclusions:</b> This expanded assessment reveals significant variability in LLM performance. In this simulated setting, ChatGPT demonstrated performance comparable to that of early-career interventional cardiologists. These results suggest that LLMs could serve as supplementary decision-support tools in interventional cardiology under simulated conditions.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.260 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.438 Zit.
Autoren
Institutionen
- Northwestern University(US)
- Intel (United States)(US)
- Northwestern Medicine(US)
- Stanford University(US)
- Istanbul Eye Hospital(TR)
- State Hospital(GB)
- Sivas State Hospital(TR)
- Education Training And Research(US)
- Sağlık Bilimleri Üniversitesi(TR)
- University of Health Sciences Antigua(AG)
- Dr. Siyami Ersek Göğüs Kalp Ve Damar Cerrahisi Eğitim Ve Araştırma Hastanesi(TR)
- University of Maryland, Baltimore(US)