OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 12.03.2026, 03:51

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Readability of AI-Generated Patient Information Leaflets on Alzheimer’s, Vascular Dementia, and Delirium

2025·1 Zitationen·CureusOpen Access
Volltext beim Verlag öffnen

1

Zitationen

3

Autoren

2025

Jahr

Abstract

Background Large language models such as ChatGPT, DeepSeek, and Gemini are increasingly used to generate patient-facing medical content. While their factual accuracy has been explored, the readability of these outputs remains less well understood. Readability is a crucial component of health communication, particularly for older adults and those with lower health literacy. This study aimed to evaluate and compare the readability of patient information leaflets generated by three large language models - ChatGPT, DeepSeek, and Gemini - on the topics of Alzheimer's disease, vascular dementia, and delirium, using five validated readability metrics. Materials and methods We conducted a cross-sectional comparative study of patient information leaflets generated by three large language models on the topics of Alzheimer's disease, vascular dementia, and delirium. Each model was prompted using identical queries, and the resulting texts were evaluated using five established readability metrics: Flesch Reading Ease, Flesch-Kincaid Grade Level, Gunning Fog Index, Simple Measure of Gobbledygook (SMOG) Index, and Automated Readability Index. Readability scores were compared using Kruskal-Wallis tests to identify statistically significant differences between models. Results ChatGPT consistently produced the most readable content, with the highest Flesch Reading Ease scores and the lowest grade-level indices. DeepSeek generated text that was markedly more complex and less accessible. Gemini performed intermediately, sometimes matching ChatGPT in specific indices but not consistently across all metrics. The difference in Flesch Reading Ease scores between models was statistically significant (H = 7.20, p = 0.027). Other metrics showed trends that approached significance. Conclusions There are meaningful differences in the readability of patient information generated by different large language models. ChatGPT appears to produce content that is more suitable for patient understanding, particularly in the context of older adult care. These findings highlight the need for careful evaluation of readability when using generative AI in clinical communication. Future research should incorporate expert review of content accuracy and appropriateness alongside readability.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Machine Learning in HealthcareArtificial Intelligence in Healthcare and EducationDementia and Cognitive Impairment Research
Volltext beim Verlag öffnen