Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Abstract 4134824: Evaluation of ChatGPT-4.0 and Google Bard ‘s Capabilities in Clinical Decision Support in Cardiac Electrophysiology

2024·0 Zitationen·Circulation

Volltext beim Verlag öffnen

Zitationen

Autoren

2024

Jahr

Abstract

Background: ChatGPT-4.0 and Bard have shown clinical decision support (CDS) potential in general medicine, but their role in EP is unknown. This study aims to evaluate ChatGPT and Bard’s CDS potential by assessing their accuracy in multiple-choice questions (MCQs), guideline recommendations (GRs) and treatment (Tx) suggestions. Methods: Two chatbots were tested with 15 clinical vignettes (CVs) and 47 case-related MCQs from Heart Rhythm Case Reports, focusing on ablation, arrhythmia and CIEDs management. CVs included narrative diagnostic images results. 3 tasks were performed: 1) Generating GRs, rated 0 for incorrect or correct but irrelevant to the primary problem, 0.5 for correct for the primary problem, 1 for case-specific (CS) if relevant to both the primary problem and concomitant conditions (e.g. afib with HF); 2) Suggesting Tx steps, scored 0 for incorrect, 0.5 for correct and 1 for CS. Tx was deemed correct if referenced in the case or guidelines, and CS if used in the case. For Tx responses not CS, a prompt was provided before reassessment. The prompt included one similar CV and its Tx from PubMed case reports. 3) Answering MCQs, rated 1 for correct and 0 for incorrect. Welch's T-test was used for analysis. Results: Bard outperformed ChatGPT in generating CS-GRs (P = 0.01). However, there was no significant difference in CS-Tx suggestions with a prompt (P value = 0.12, Figure 1C) or without a prompt (P value = 0.59, Figure 1A). When prompted for non-CS-Tx responses, ChatGPT significantly improved from 0.66 to 0.93 (P value = 0.02), suggesting an enhanced ability to provide CS-Tx plans post-prompt. In contrast, Bard showed no notable improvement (0.73 vs. 0.76, P value = 0.79, Figure 1B). Both chatbots demonstrated similar MCQ accuracy, with scores below 70%, indicating EP training gaps or the need for prompts to activate existing knowledge. Conclusion: This study showed Bard's superiority in generating GRs and ChatGPT's remarkable improvement in suggesting Tx when external knowledge is provided, revealing their CDS potential in specialized fields.

Autoren

Institutionen

Allegheny Health Network(US)

Themen

Artificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Abstract 4134824: Evaluation of ChatGPT-4.0 and Google Bard ‘s Capabilities in Clinical Decision Support in Cardiac Electrophysiology

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen