Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Readability of AI-Generated Patient Information Leaflets on Alzheimer’s, Vascular Dementia, and Delirium
1
Zitationen
3
Autoren
2025
Jahr
Abstract
Background Large language models such as ChatGPT, DeepSeek, and Gemini are increasingly used to generate patient-facing medical content. While their factual accuracy has been explored, the readability of these outputs remains less well understood. Readability is a crucial component of health communication, particularly for older adults and those with lower health literacy. This study aimed to evaluate and compare the readability of patient information leaflets generated by three large language models - ChatGPT, DeepSeek, and Gemini - on the topics of Alzheimer's disease, vascular dementia, and delirium, using five validated readability metrics. Materials and methods We conducted a cross-sectional comparative study of patient information leaflets generated by three large language models on the topics of Alzheimer's disease, vascular dementia, and delirium. Each model was prompted using identical queries, and the resulting texts were evaluated using five established readability metrics: Flesch Reading Ease, Flesch-Kincaid Grade Level, Gunning Fog Index, Simple Measure of Gobbledygook (SMOG) Index, and Automated Readability Index. Readability scores were compared using Kruskal-Wallis tests to identify statistically significant differences between models. Results ChatGPT consistently produced the most readable content, with the highest Flesch Reading Ease scores and the lowest grade-level indices. DeepSeek generated text that was markedly more complex and less accessible. Gemini performed intermediately, sometimes matching ChatGPT in specific indices but not consistently across all metrics. The difference in Flesch Reading Ease scores between models was statistically significant (H = 7.20, p = 0.027). Other metrics showed trends that approached significance. Conclusions There are meaningful differences in the readability of patient information generated by different large language models. ChatGPT appears to produce content that is more suitable for patient understanding, particularly in the context of older adult care. These findings highlight the need for careful evaluation of readability when using generative AI in clinical communication. Future research should incorporate expert review of content accuracy and appropriateness alongside readability.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.156 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.543 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.051 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.416 Zit.
Analysis of Survival Data.
1985 · 4.379 Zit.