Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Assessment of ChatGPT for Nutrition Advice: A comparative analysis across multiple languages (Preprint)
0
Zitationen
6
Autoren
2024
Jahr
Abstract
<sec> <title>BACKGROUND</title> Background: The increased interest in AI tools such as large language models in the field of medicine, particularly nutrition, underscores the importance of evaluating their efficacy across various languages. While large language models such as ChatGPT-4 have showed competency in English, their performance in underrepresented languages such as Kazakh and Russian still needs to be investigated. Given the lack of non-English training data, it is critical to investigate the capabilities of ChatGPT-4 in providing specific nutritional recommendations across different languages. </sec> <sec> <title>OBJECTIVE</title> The research objective is to assess and evaluate how well ChatGPT-4 system can provide personalized, evidence-based and practical nutritional advice in English, Kazakh, and Russian. </sec> <sec> <title>METHODS</title> This study was conducted from May 15 to August 31, 2023. Fifty mock patient case studies were input into ChatGPT-4, which generated nutritional recommendations and diet plans. The quality of generated outputs for underrepresented languages (e.g. Russian and Kazakh) was enhanced through intermediate translation steps using Google Translate API. All responses were evaluated for personalization, consistency, and practicality using a 5-point Likert scale. To identify significant differences amongst the three languages, the Kruskal Wallis Test was conducted. Additional pairwise comparisons for each language were carried out using the Post-hoc Dunn's Test. </sec> <sec> <title>RESULTS</title> There were significant differences observed among the scores for the various outputs generated in three languages (p-value<0.0001). Whilst the performance of the ChatGPT-4 system was moderate across all categories for both English and Russian, the Kazakh outputs were not applicable for evaluation. For English outputs, the average scores were 3.32 ±0.46 for personalization category, 3.48 ±0.43 for consistency, and 3.25 ±0.41 for practicality & availability. For Russian, the average scores were slightly lower with 3.18 ±0.38 for personalization, 3.38 ±0.39 for consistency, and 3.37 ±0.38 for practicality & availability. As for the Kazakh language, all categories score just above 1. However, after the machine translation step, nutritional recommendations in Kazakh language improved. After machine translation, there were no significant differences among the outputs in the three languages. </sec> <sec> <title>CONCLUSIONS</title> These observations reveal that, even when employing the same prompts in three different languages, the ChatGPT-4 system's ability to generate coherent responses is limited due to insufficient training data in non-English languages. These findings suggest that the inclusion of non-English training datasets can be valuable for optimizing the performance of large language models. Moreover, this study underscores the potential of leveraging automated machine translation as a means to overcome the existing constraints in ChatGPT-system in providing dietary guidance to non-English-speaking populations. </sec> <sec> <title>CLINICALTRIAL</title> Not applicable </sec>
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.391 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.257 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.685 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.501 Zit.