Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
MON-822 GLP-1 and Anesthesia: Queries to AI Chatbots
0
Zitationen
8
Autoren
2025
Jahr
Abstract
Abstract Disclosure: E. Pan: None. G. Wu: None. S. Sidhu: None. A. Sidhu: None. A. Ashok: None. D. Kim: None. B. Hoang: None. V.V. Toram: None. Background: One in eight adults in the US, around 12%, say they have taken a GLP-1 agonist at least once, including 6% who say they are currently taking such a drug. Some research has shown an association between GLP-1 agonists and nausea, vomiting, and delayed gastric emptying, which can be dangerous during anesthesia and following procedures. Purpose: Evaluate whether AI models can correctly identify anesthesia-related risks associated with GLP-1 agonist drugs. MethodsFive questions about GLP-1, some targeting possible anesthesia risks, were asked in English, Chinese, Hindi, Japanese, Korean, and Punjabi to five chatbots: Claude, Coral, Gemini, GPT 4o Mini, and GPT 4o. Textual responses from the chatbots were recorded and then scored on a scale of 1 to 5 with the help of native speakers of each language. The raters combined to rate English responses, and were blinded to the chatbot they were scoring to eliminate bias. Results: English and Chinese outputs consistently scored higher than responses in Hindi, Japanese, Korean, and Punjabi. Out of these four languages, Punjabi consistently received the lowest scores. All responses in English received a score of 5 except for one answer by Gemini, which received a 4, indicating that the chatbots responded consistently and accurately to queries about GLP-1 and anesthesia and were aware of possible risks. On the other hand, Punjabi responses frequently received scores of 3 or below, and for one chatbot, Coral, responses were incomprehensible, leading to a score of 1 for all responses. This indicates a lack of knowledge and a loss of information in responses, which could have been mistranslated by the chatbot from English. Responses in other languages scored in between the two extremes. Out of the chatbots, ChatGPT 4o consistently scored the highest, while Coral scored the lowest and was the most variable. It was also the only chatbot to generate responses with a score of 1. Conclusion: The clear hierarchy in language accuracy (English, Chinese > Hindi, Korean, Japanese > Punjabi) suggests a significant disparity in the quality of medical information provided across different languages. To narrow the gap, diverse datasets in multiple languages should be used to train chatbots to ensure proper dissemination of accurate information for patients. Specifically, smaller languages require more training and better data, highlighting the disparities in health information that needs to be corrected. Presentation: Monday, July 14, 2025
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.260 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.438 Zit.