Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Abstract 4134824: Evaluation of ChatGPT-4.0 and Google Bard ‘s Capabilities in Clinical Decision Support in Cardiac Electrophysiology
0
Zitationen
5
Autoren
2024
Jahr
Abstract
Background: ChatGPT-4.0 and Bard have shown clinical decision support (CDS) potential in general medicine, but their role in EP is unknown. This study aims to evaluate ChatGPT and Bard’s CDS potential by assessing their accuracy in multiple-choice questions (MCQs), guideline recommendations (GRs) and treatment (Tx) suggestions. Methods: Two chatbots were tested with 15 clinical vignettes (CVs) and 47 case-related MCQs from Heart Rhythm Case Reports, focusing on ablation, arrhythmia and CIEDs management. CVs included narrative diagnostic images results. 3 tasks were performed: 1) Generating GRs, rated 0 for incorrect or correct but irrelevant to the primary problem, 0.5 for correct for the primary problem, 1 for case-specific (CS) if relevant to both the primary problem and concomitant conditions (e.g. afib with HF); 2) Suggesting Tx steps, scored 0 for incorrect, 0.5 for correct and 1 for CS. Tx was deemed correct if referenced in the case or guidelines, and CS if used in the case. For Tx responses not CS, a prompt was provided before reassessment. The prompt included one similar CV and its Tx from PubMed case reports. 3) Answering MCQs, rated 1 for correct and 0 for incorrect. Welch's T-test was used for analysis. Results: Bard outperformed ChatGPT in generating CS-GRs (P = 0.01). However, there was no significant difference in CS-Tx suggestions with a prompt (P value = 0.12, Figure 1C) or without a prompt (P value = 0.59, Figure 1A). When prompted for non-CS-Tx responses, ChatGPT significantly improved from 0.66 to 0.93 (P value = 0.02), suggesting an enhanced ability to provide CS-Tx plans post-prompt. In contrast, Bard showed no notable improvement (0.73 vs. 0.76, P value = 0.79, Figure 1B). Both chatbots demonstrated similar MCQ accuracy, with scores below 70%, indicating EP training gaps or the need for prompts to activate existing knowledge. Conclusion: This study showed Bard's superiority in generating GRs and ChatGPT's remarkable improvement in suggesting Tx when external knowledge is provided, revealing their CDS potential in specialized fields.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.339 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.211 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.614 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.478 Zit.