Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
SAT-690 Chatbots & Obesity: Are the Responses Accurate?
0
Zitationen
9
Autoren
2025
Jahr
Abstract
Abstract Disclosure: E. Pan: None. G. Wu: None. S. Sidhu: None. A. Sidhu: None. I. Chim: None. A. Ashok: None. A. Madala: None. V. Toram: None. R. Toram: None. Background: More than 1 billion people worldwide are obese - 650 million adults, 340 million adolescents, and 39 million children. This number is still increasing. At the same time, an estimated 462 million individuals are affected by type 2 diabetes, corresponding to 6.28% of the world's population. Certain studies and reports indicate that in some regions, women may have slightly higher or lower prevalence rates of obesity compared to men, influenced by factors like socio-economic conditions, lifestyle, and access to healthcare. Purpose: Determine if chatbots can give medically accurate responses for obesity patients, and observe if there are disparities in responses across patient demographics. Methods: Four questions were formulated, two of which targeted the causes of obesity, and two of which were nearly identical diagnosis questions that only differed in race and diabetic condition. The last two questions were specifically designed to observe any differences in chatbots’ responses based on demographic factors. The questions were asked to four chatbots: Claude, Gemini, ChatGPT 4o Mini, and ChatGPT 4o. Textual responses from each chatbot were recorded and scored on a scale of 1 to 5 twice, once manually, and the other a self-score by the chatbots. Results: The average manual score for all models was greater than 3, indicating a baseline for complexity and accuracy for all of the chatbots tested. Question 2 showed the highest degree of accuracy and the lowest variability, suggesting that chatbots respond better to factual queries. Questions 1, 3, and 4 all displayed high variability in response scores, and these three questions’ median sores were also significantly lower than that of question 2. Unlike question 2, these three questions focused on more abstract topics, suggesting that different question types may affect responses. Conclusion: Chatbots showed significantly less spread and were more accurate on question 2 compared to the other questions, which were more abstract. Responses for the patient in question 4 were more varied, but scored higher overall when compared to the patient in question 3, emphasizing the need to better educate chatbots on differences in factors such as race or medical condition, which could cause differences in response quality and accuracy. Overall, the results highlight a need to better train chatbots to deal with different query types and patient demographics. Presentation: Saturday, July 12, 2025
Ähnliche Arbeiten
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller
1999 · 5.632 Zit.
An experiment in linguistic synthesis with a fuzzy logic controller
1975 · 5.562 Zit.
A FRAMEWORK FOR REPRESENTING KNOWLEDGE
1988 · 4.548 Zit.
Opinion Paper: “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy
2023 · 3.356 Zit.