Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
S2155 Comparative Evaluation of Various Artificially Intelligent Chatbots for Management Recommendations of Common Gastroenterology Diseases
0
Zitationen
1
Autoren
2024
Jahr
Abstract
Introduction: The integration of artificial intelligence (AI) in healthcare, particularly within gastroenterology, holds the potential to significantly enhance diagnostic accuracy, patient management, and clinical workflows. Advanced AI tools such as ChatGPT-4o, Gemini Pro, and Perplexity Pro are capable of interpreting endoscopic images, analyzing clinical data, and automating various administrative tasks. These advancements can streamline medical procedures and personalize treatment plans. Methods: This study aimed to compare the accuracy and reliability of ChatGPT-4o, Gemini Pro, and Perplexity Pro in responding to gastroenterology-related medical management questions, including those related to ulcerative colitis, alcoholic hepatitis, peptic ulcer disease, and others. Each chatbot response was independently reviewed by itself and each other using an answer key derived from current literature on UpToDate. Responses were rated on a 10-point Likert scale, with 10 indicating the highest accuracy. The overall statistical significance was assessed using the Friedman test, while pairwise comparisons between chatbots were conducted using the Mann-Whitney U test. Results: The analysis showed no significant difference in the chatbots' abilities to answer medical management questions accurately and reliably when using the Friedman test (P = 0.06 – 1.0). Despite the lack of statistical significance, Perplexity Pro consistently achieved higher Likert scores compared to ChatGPT-4o and Gemini Pro. When conducting pairwise comparisons, however, The Mann-Whitney U test revealed significant differences between ChatGPT-4o and Perplexity Pro (P = 0.001) and between Gemini Pro and Perplexity Pro (P = 0.0003). Conclusion: AI chatbots like ChatGPT-4o, Gemini Pro, and Perplexity Pro demonstrate substantial potential in enhancing gastroenterology practices. While the Friedman test showed no significant difference (P = 0.06 – 1.0), pairwise comparisons revealed Perplexity Pro's superior performance in answering gastroenterology-related medical questions. The comparable performance among these chatbots, however, underscores the necessity for continued development and evaluation to ensure their effectiveness in clinical settings. Future studies should focus on refining these AI tools and exploring their integration into clinical practice, such as answering patient portal messages. Of note, AI was used in the development of this abstract (see Figure 1, Table 1).Figure 1.: ChatGPT-4o's response to "what is the management for eosinophilic esophagitis?". Table 1. - AI Chatbot Likert Scores (10-point scale, 10 is high) Question ChatGPT-4o Gemini Pro Perplexity Pro P-Value (Friedman) P-Value(GPT vs G) P-Value(GPT vs P) P-Value(G vs P) What's the Management of Ulcerative Colitis? 8.33 8.00 8.67 0.07 0.139 0.001 0.0003 What's the Management of Alcoholic Hepatitis? 8.33 8.33 9.00 0.06 What's the Management of Peptic Ulcer Disease? 8.33 8.33 8.33 1.00 What's the Management of Celiac Disease? 8.33 7.00 9.00 0.06 What's the Management of Choledocholithiasis? 8.00 7.00 8.67 0.06 What's the Management of Irritable Bowel Syndrome? 8.33 7.67 9.00 0.22 What's the Management of Eosinophilic Esophagitis? 8.00 7.33 8.67 0.06 What's the Management of an Upper GI Bleed? 9.00 8.67 9.00 0.37 What's the Management of Acute Pancreatitis? 8.33 8.33 9.00 0.91 What's the Workup for Dysphagia? 8.33 8.33 8.67 0.72 What's the Workup for GERD? 7.67 7.67 8.67 0.37 GPT indicates ChatGPT-4o, G indicates Gemini Pro, and P indicates Perplexity Pro.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.200 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.051 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.416 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.410 Zit.