OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 21.03.2026, 00:37

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

MON-822 GLP-1 and Anesthesia: Queries to AI Chatbots

2025·0 Zitationen·Journal of the Endocrine SocietyOpen Access
Volltext beim Verlag öffnen

0

Zitationen

8

Autoren

2025

Jahr

Abstract

Abstract Disclosure: E. Pan: None. G. Wu: None. S. Sidhu: None. A. Sidhu: None. A. Ashok: None. D. Kim: None. B. Hoang: None. V.V. Toram: None. Background: One in eight adults in the US, around 12%, say they have taken a GLP-1 agonist at least once, including 6% who say they are currently taking such a drug. Some research has shown an association between GLP-1 agonists and nausea, vomiting, and delayed gastric emptying, which can be dangerous during anesthesia and following procedures. Purpose: Evaluate whether AI models can correctly identify anesthesia-related risks associated with GLP-1 agonist drugs. MethodsFive questions about GLP-1, some targeting possible anesthesia risks, were asked in English, Chinese, Hindi, Japanese, Korean, and Punjabi to five chatbots: Claude, Coral, Gemini, GPT 4o Mini, and GPT 4o. Textual responses from the chatbots were recorded and then scored on a scale of 1 to 5 with the help of native speakers of each language. The raters combined to rate English responses, and were blinded to the chatbot they were scoring to eliminate bias. Results: English and Chinese outputs consistently scored higher than responses in Hindi, Japanese, Korean, and Punjabi. Out of these four languages, Punjabi consistently received the lowest scores. All responses in English received a score of 5 except for one answer by Gemini, which received a 4, indicating that the chatbots responded consistently and accurately to queries about GLP-1 and anesthesia and were aware of possible risks. On the other hand, Punjabi responses frequently received scores of 3 or below, and for one chatbot, Coral, responses were incomprehensible, leading to a score of 1 for all responses. This indicates a lack of knowledge and a loss of information in responses, which could have been mistranslated by the chatbot from English. Responses in other languages scored in between the two extremes. Out of the chatbots, ChatGPT 4o consistently scored the highest, while Coral scored the lowest and was the most variable. It was also the only chatbot to generate responses with a score of 1. Conclusion: The clear hierarchy in language accuracy (English, Chinese > Hindi, Korean, Japanese > Punjabi) suggests a significant disparity in the quality of medical information provided across different languages. To narrow the gap, diverse datasets in multiple languages should be used to train chatbots to ensure proper dissemination of accurate information for patients. Specifically, smaller languages require more training and better data, highlighting the disparities in health information that needs to be corrected. Presentation: Monday, July 14, 2025

Ähnliche Arbeiten