OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 21.04.2026, 18:43

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

S1719 Reliability Concerns: Can AI Interpret Nuanced Medical and Ethical Scenarios in the Field of Gastroenterology

2023·0 Zitationen·The American Journal of Gastroenterology
Volltext beim Verlag öffnen

0

Zitationen

9

Autoren

2023

Jahr

Abstract

Introduction: AI tools like ChatGPT and Google Bard are gaining traction in healthcare, notably gastroenterology, offering benefits such as vast data knowledge, swift responses, and easy access. However, their reliability in medical and ethical decision-making is uncertain. They provide information effectively but cannot fully emulate the nuanced understanding and empathy of human medical professionals. Especially in ethical decisions, which demand comprehension of personal and contextual factors, these AI tools should serve as adjuncts, not replacements, to expert human judgment. Methods: The study evaluated the medical and ethical dependability of two widely-used chatbots, ChatGPT, and Google BARD, within the gastroenterology sphere. A questionnaire was administered to both bots, with their responses being rated using a 1-10 Likert scale where 1 indicated exceptional accuracy. To ensure unbiased evaluation, two independent assessors analyzed each bot's answers. The goal was to systematically evaluate the chatbots' competencies and trustworthiness using this performance review. The involvement of dual evaluators and the application of the Likert scale aimed to mitigate any potential bias, therefore strengthening the validity of the findings. Results: Our study compared the dependability of ChatGPT and Google BARD in medical management scenarios. ChatGPT scored 21% (p < 0.01), and Google BARD scored 19% (P=0.022) in terms of reliability when juxtaposed with standardized practices. Among the chatbots, ChatGPT had a higher score relative to Google BARD (67% vs. 41%, P=0.034). However, both chatbots' reliability scores were inferior compared to standard practice. This underscores the importance of reliability in developing gastroenterology-focused chatbots and the need for ongoing research and improvements in this field (Figure 1, Table 1). Conclusion: Despite potential benefits, AI tools like ChatGPT and Google Bard currently fall short in assisting medical and ethical decisions in gastroenterology, as shown by lower reliability scores against standardized guidelines. Although ChatGPT marginally outperformed Google Bard, both fail to match the nuanced understanding and empathy of human healthcare professionals. This underlines the crucial need for AI dependability and the importance of ongoing research to enhance these technologies, ensuring they support, not supplant, human judgment in decision-making.Figure 1.: Chatbot result outcome on reliability. Table 1. - Chatbot reliability questions scale outcome Reliability Questions Chat GPT (Likert scale 1-10, 1 being low and 10 being high) compared to standardized practice Google BARD (Likert scale 1-10, 1 being low and 10 being high) compared to standardized practice P-value I am 30 yrs. old, can I get a colonoscopy? 4 2 P= 0.012 How frequently should the patient be reviewed with liver cirrhosis, and what specific indicators of progress should we look for 4.6 2.5 P< 0.05 In the case of Pancreatic necrosis, what would be the long-term management strategy? 3.9 2.9 P= 0.002 How would you rank the proposed treatments for IBD effectiveness and potential side effects for this patient? 5.5 3.5 p = 0.011 Based on the patient's potential colon cancer diagnosis, what are the most effective treatment options available? 6.5 5.5 P= 0.047

Ähnliche Arbeiten