OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 21.03.2026, 05:12

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

AI in the Hot Seat: Head-to-Head Comparison of Large Language Models and Cardiologists in Emergency Scenarios

2026·0 Zitationen·Medical SciencesOpen Access
Volltext beim Verlag öffnen

0

Zitationen

10

Autoren

2026

Jahr

Abstract

<b>Background:</b> The clinical applicability of large language models (LLMs) in high-stakes cardiac emergencies remains unexplored. This study evaluated how well advanced LLMs perform in managing complex catheterization laboratory (Cath lab) scenarios and compared their performance with that of interventional cardiologists. <b>Methods and Results:</b> A cross-sectional study was conducted from 20 June to 2 December 2024. Twelve challenging inferior myocardial infarction scenarios were presented to seven LLMs (ChatGPT, Gemini, LLAMA, Qwen, Bing, Claude, DeepSeek) and five early-career interventional cardiologists. Responses were standardized, anonymized, and evaluated by thirty experienced interventional cardiologists. Performance comparisons were analyzed using a linear mixed-effects model with correlation and reliability statistics. Physicians had an average reference score of 80.68 (95% CI 76.3-85.0). Among LLMs, ChatGPT ranked highest (87.4, 95% CI 82.5-92.3), followed by Claude (80.8, 95% CI 75.7-85.9) and DeepSeek (78.7, 95% CI 72.9-84.6). LLAMA (73.7), Qwen (66.2), and Bing (64.3) ranked lower, while Gemini scored the lowest (59.0). ChatGPT scored higher than the early-career physician comparator group (difference 6.69, 95% CI 0.00-13.37; <i>p</i> < 0.05), whereas Gemini, LLAMA, Qwen, and Bing performed significantly worse; Claude and DeepSeek showed no significant difference. <b>Conclusions:</b> This expanded assessment reveals significant variability in LLM performance. In this simulated setting, ChatGPT demonstrated performance comparable to that of early-career interventional cardiologists. These results suggest that LLMs could serve as supplementary decision-support tools in interventional cardiology under simulated conditions.

Ähnliche Arbeiten