Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

S2155 Comparative Evaluation of Various Artificially Intelligent Chatbots for Management Recommendations of Common Gastroenterology Diseases

2024·0 Zitationen·The American Journal of Gastroenterology

Volltext beim Verlag öffnen

Zitationen

Autoren

2024

Jahr

Abstract

Introduction: The integration of artificial intelligence (AI) in healthcare, particularly within gastroenterology, holds the potential to significantly enhance diagnostic accuracy, patient management, and clinical workflows. Advanced AI tools such as ChatGPT-4o, Gemini Pro, and Perplexity Pro are capable of interpreting endoscopic images, analyzing clinical data, and automating various administrative tasks. These advancements can streamline medical procedures and personalize treatment plans. Methods: This study aimed to compare the accuracy and reliability of ChatGPT-4o, Gemini Pro, and Perplexity Pro in responding to gastroenterology-related medical management questions, including those related to ulcerative colitis, alcoholic hepatitis, peptic ulcer disease, and others. Each chatbot response was independently reviewed by itself and each other using an answer key derived from current literature on UpToDate. Responses were rated on a 10-point Likert scale, with 10 indicating the highest accuracy. The overall statistical significance was assessed using the Friedman test, while pairwise comparisons between chatbots were conducted using the Mann-Whitney U test. Results: The analysis showed no significant difference in the chatbots' abilities to answer medical management questions accurately and reliably when using the Friedman test (P = 0.06 – 1.0). Despite the lack of statistical significance, Perplexity Pro consistently achieved higher Likert scores compared to ChatGPT-4o and Gemini Pro. When conducting pairwise comparisons, however, The Mann-Whitney U test revealed significant differences between ChatGPT-4o and Perplexity Pro (P = 0.001) and between Gemini Pro and Perplexity Pro (P = 0.0003). Conclusion: AI chatbots like ChatGPT-4o, Gemini Pro, and Perplexity Pro demonstrate substantial potential in enhancing gastroenterology practices. While the Friedman test showed no significant difference (P = 0.06 – 1.0), pairwise comparisons revealed Perplexity Pro's superior performance in answering gastroenterology-related medical questions. The comparable performance among these chatbots, however, underscores the necessity for continued development and evaluation to ensure their effectiveness in clinical settings. Future studies should focus on refining these AI tools and exploring their integration into clinical practice, such as answering patient portal messages. Of note, AI was used in the development of this abstract (see Figure 1, Table 1).Figure 1.: ChatGPT-4o's response to "what is the management for eosinophilic esophagitis?". Table 1. - AI Chatbot Likert Scores (10-point scale, 10 is high) Question ChatGPT-4o Gemini Pro Perplexity Pro P-Value (Friedman) P-Value(GPT vs G) P-Value(GPT vs P) P-Value(G vs P) What's the Management of Ulcerative Colitis? 8.33 8.00 8.67 0.07 0.139 0.001 0.0003 What's the Management of Alcoholic Hepatitis? 8.33 8.33 9.00 0.06 What's the Management of Peptic Ulcer Disease? 8.33 8.33 8.33 1.00 What's the Management of Celiac Disease? 8.33 7.00 9.00 0.06 What's the Management of Choledocholithiasis? 8.00 7.00 8.67 0.06 What's the Management of Irritable Bowel Syndrome? 8.33 7.67 9.00 0.22 What's the Management of Eosinophilic Esophagitis? 8.00 7.33 8.67 0.06 What's the Management of an Upper GI Bleed? 9.00 8.67 9.00 0.37 What's the Management of Acute Pancreatitis? 8.33 8.33 9.00 0.91 What's the Workup for Dysphagia? 8.33 8.33 8.67 0.72 What's the Workup for GERD? 7.67 7.67 8.67 0.37 GPT indicates ChatGPT-4o, G indicates Gemini Pro, and P indicates Perplexity Pro.

Autoren

Daniel Willcockson

Institutionen

UW Health University Hospital(US)

Themen

Artificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

S2155 Comparative Evaluation of Various Artificially Intelligent Chatbots for Management Recommendations of Common Gastroenterology Diseases

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen