Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Accuracy of ChatGPT and DeepSeek in answering clinical questions from the 2025 Society for Cardiovascular Angiography & Interventions/Heart Rhythm Society left atrial appendage occlusion guidelines
0
Zitationen
5
Autoren
2026
Jahr
Abstract
ObjectiveTo evaluate the accuracy of ChatGPT and DeepSeek in answering guideline-based clinical questions in cardiology.MethodsIn August 2025, responses generated from four large language models to eight clinical questions based on the 2025 Society for Cardiovascular Angiography & Interventions/Heart Rhythm Society guidelines were evaluated. Three cardiologists independently rated accuracy using a six-point Likert scale: (a) completely incorrect; (b) more incorrect than correct; (c) nearly equally correct and incorrect; (d) more correct than incorrect; (e) nearly all correct; and (f) completely correct. Reproducibility (Fleiss' kappa coefficient, five repeated queries) and inter-rater reliability (intraclass correlation coefficient) were assessed.ResultsThe median (interquartile range) accuracy scores were 5.5 (5, 6) for ChatGPT-5, 6 (5, 6) for ChatGPT-4o, and 5 (4, 6) for both DeepSeek-R1 and DeepSeek-V3, with a significant overall difference (p < 0.001). Pairwise comparisons showed significantly higher accuracy for ChatGPT models than for DeepSeek models (all p < 0.001), whereas no significant differences were observed between ChatGPT-5 and ChatGPT-4o (p = 0.518) or between DeepSeek-R1 and DeepSeek-V3 (p = 0.812). Reproducibility (Fleiss' kappa coefficient) was excellent for ChatGPT-5 (0.803) and good for ChatGPT-4o (0.574), DeepSeek-R1 (0.577), and DeepSeek-V3 (0.618). Overall inter-rater reliability was moderate (intraclass correlation coefficient = 0.463).ConclusionsChatGPT and DeepSeek demonstrated high accuracy and reproducibility but moderate inter-rater reliability, necessitating further validation for educational use.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.490 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.376 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.832 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.553 Zit.