Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Validity and reliability of ChatGPT's responses on dietary supplements in Japan: A quality assessment and content analysis
0
Zitationen
7
Autoren
2026
Jahr
Abstract
Objective: This study evaluated the validity and reliability of large language model (LLM) responses on dietary supplements (DS), a domain marked by scientific controversy and misinformation. The goal was to support informed consumer decisions and guide improvements in LLM performance. Methods: We collected responses from GPT-4 and GPT-4o on the effects of 30 DS on six diseases. Two medical professionals categorized each response as "Effective," "Uncertain," or "Not Effective." They also created a guideline to assess evidence-based effectiveness and compared it with LLM-generated responses to determine accuracy. Additionally, we conducted qualitative content analysis to identify response patterns and misleading content. Results: GPT-4 and GPT-4o affirmed DS effectiveness in only 10% of cases, with 40% rated as "Uncertain" and 50% as "Not Effective." Accuracy was about 57%, considerably lower than that observed in nutrition-related studies (57% in DS vs. 80% ∼ in structured nutrition tasks"). Content analysis showed templated responses, frequent ambiguity, and occasional inclusion of irrelevant or incorrect information. Conclusion: Our findings suggest that ChatGPT's responses on dietary supplements are generally cautious but often ambiguous, with a moderate risk of misinformation. As generative AI becomes a common source for health advice, these limitations could mislead users. Enhancing LLMs' evidence-based accuracy and ensuring consistent professional guidance are essential. Innovation: This is the first study to assess the validity and reliability of LLM-generated responses on dietary supplements using both quantitative and qualitative methods. We also developed a novel evidence-based framework to evaluate supplement effectiveness, providing a new tool for future research and supporting safer AI-assisted health communication.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.635 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.543 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.051 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.844 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.