Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

S2997 Health Literacy in the Age of Artificial Intelligence: Readability of LLM-Generated Materials for Patients With IBD

2025·0 Zitationen·The American Journal of Gastroenterology

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Introduction: Large language models (LLMs) are increasingly applied in healthcare and their ability to generate educational materials for complicated conditions, such as inflammatory bowel disease (IBD). It is important to assess whether such LLM-generated materials align with established health literacy standards, as recommended by the National Institutes of Health (NIH), and Agency for Healthcare Research and Quality. This study evaluates the readability of LLM-generated materials for patients with IBD and evaluates whether these emerging technologies can consistently deliver accessible health information. Methods: Five LLMs (ChatGPT 4o, Gemini 2.5, Claude 4.0, Doximity GPT, and OpenEvidence®) were asked 3 prompts: Prompt 1 (“What is [condition]?”), Prompt 2 (“I am a patient that was just diagnosed with [condition]. Explain that to me in simple terms”), and Prompt 3 (“Explain [condition] to a patient at a 6th grade reading level or below”). The conditions were IBD, Crohn’s, and ulcerative colitis. The same prompt was repeated 3 times for every condition on each LLM, for a total of 135 outputs. Readability was assessed with the Simple Measure of Gobbledygook (SMOG) index via the Sydney Health Literacy Lab Health Literacy Editor. The Shapiro-Wilk test was used to assess normality, followed by appropriate statistical analysis to assess for within-model differences across prompts and between-model differences for the same prompt, with post-hoc analysis and a P-value of < 0.05 considered statistically significant. Results: Across all 5 LLMs, both the within-model and between-model differences were found to be statistically significant for each prompt (P < 0.001) with readability improving from Prompt 1 to 3. Although ChatGPT averaged higher readability than other LLMs in prompts 1 and 2, and Claude in prompt 3, none consistently provided materials below a 6th grade reading level. OpenEvidence® had the most complex outputs, averaging the highest SMOG scores consistently across prompts. Conclusion: This study illustrates that, in their current models, the 5 tested LLMs generate outputs at a higher reading level than recommended for healthcare material for patients with IBD, even when prompted to specifically do so. Further training of LLMs is indicated to ensure they consistently provide accessible health information to patients.

Autoren

Themen

Artificial Intelligence in Healthcare and EducationHealth Literacy and Information AccessibilityImmune responses and vaccinations

Volltext beim Verlag öffnen

S2997 Health Literacy in the Age of Artificial Intelligence: Readability of LLM-Generated Materials for Patients With IBD

Abstract

Ähnliche Arbeiten

Autoren

Themen