Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Accuracy of ChatGPT and DeepSeek in answering clinical questions from the 2025 Society for Cardiovascular Angiography & Interventions/Heart Rhythm Society left atrial appendage occlusion guidelines

2026·0 Zitationen·Journal of International Medical ResearchOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

ObjectiveTo evaluate the accuracy of ChatGPT and DeepSeek in answering guideline-based clinical questions in cardiology.MethodsIn August 2025, responses generated from four large language models to eight clinical questions based on the 2025 Society for Cardiovascular Angiography & Interventions/Heart Rhythm Society guidelines were evaluated. Three cardiologists independently rated accuracy using a six-point Likert scale: (a) completely incorrect; (b) more incorrect than correct; (c) nearly equally correct and incorrect; (d) more correct than incorrect; (e) nearly all correct; and (f) completely correct. Reproducibility (Fleiss' kappa coefficient, five repeated queries) and inter-rater reliability (intraclass correlation coefficient) were assessed.ResultsThe median (interquartile range) accuracy scores were 5.5 (5, 6) for ChatGPT-5, 6 (5, 6) for ChatGPT-4o, and 5 (4, 6) for both DeepSeek-R1 and DeepSeek-V3, with a significant overall difference (p < 0.001). Pairwise comparisons showed significantly higher accuracy for ChatGPT models than for DeepSeek models (all p < 0.001), whereas no significant differences were observed between ChatGPT-5 and ChatGPT-4o (p = 0.518) or between DeepSeek-R1 and DeepSeek-V3 (p = 0.812). Reproducibility (Fleiss' kappa coefficient) was excellent for ChatGPT-5 (0.803) and good for ChatGPT-4o (0.574), DeepSeek-R1 (0.577), and DeepSeek-V3 (0.618). Overall inter-rater reliability was moderate (intraclass correlation coefficient = 0.463).ConclusionsChatGPT and DeepSeek demonstrated high accuracy and reproducibility but moderate inter-rater reliability, necessitating further validation for educational use.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationExplainable Artificial Intelligence (XAI)Clinical Reasoning and Diagnostic Skills

Volltext beim Verlag öffnen

Accuracy of ChatGPT and DeepSeek in answering clinical questions from the 2025 Society for Cardiovascular Angiography &amp; Interventions/Heart Rhythm Society left atrial appendage occlusion guidelines

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Accuracy of ChatGPT and DeepSeek in answering clinical questions from the 2025 Society for Cardiovascular Angiography & Interventions/Heart Rhythm Society left atrial appendage occlusion guidelines