Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Challenges of using generative artificial intelligence for diabetes patient education: a cross-platform analysis of the quality, readability, and actionability of text generated by large language models

2026·0 Zitationen·Frontiers in Public HealthOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Objective To compare, across large language model (LLM) platforms, the quality, readability, and completeness of action-oriented instructions in diabetes self-management education texts, and to quantify the associations among these domains to inform model selection and risk mitigation. Methods Ten LLM platforms were used to generate diabetes education texts (total n = 200), stratified by topic. Outcomes included the Global Quality Score (GQS), the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P), and EQIP-36 (Ensuring Quality Information for Patients, 36-item version). Text characteristics, including word count, sentence count, and syllable count, were recorded. Readability was assessed using the Automated Readability Index (ARI), Coleman–Liau Index (CLI), Flesch–Kincaid Grade Level (FKGL), Flesch Reading Ease Score (FRES), Gunning Fog Index (GFOG), Linsear Write (LW), and the Simple Measure of Gobbledygook (SMOG). Between-platform differences were evaluated using one-way ANOVA or the Kruskal–Wallis test, as appropriate. Associations between readability indices and GQS, PEMAT-P, and EQIP-36 were examined using correlation heat maps and exploratory stepwise multiple linear regression. Because the readability indices were highly intercorrelated, these regression analyses were considered exploratory and were used to identify candidate readability-related correlates rather than definitive independent predictors. Results GQS and PEMAT-P differed significantly across platforms (both p &lt; 0.001), whereas EQIP-36 did not ( p = 0.062). Text length and readability also varied by platform (most p &lt; 0.001). After stratification by topic, PEMAT-P understandability, PEMAT-P total score, and GQS no longer differed significantly across topics ( p = 0.356, p = 0.247, and p = 0.182, respectively), whereas PEMAT-P actionability ( p &lt; 0.001), EQIP-36 ( p &lt; 0.001), and several readability metrics remained significantly different. Difficulty indices were strongly intercorrelated, and FRES was inversely associated with multiple difficulty indices. Exploratory regression analyses suggested that greater reading burden tended to co-occur with lower GQS, PEMAT-P, and EQIP-36 scores. Conclusion LLM-generated diabetes education texts exhibit marked cross-platform heterogeneity, and exploratory analyses suggest a potential trade-off between readability and both information quality and the completeness of action-oriented instructions. Clinical implementation should therefore combine careful platform selection, structured prompting with templates, human–AI review, and continuous quality monitoring to support safe, readable, and actionable patient education.

Autoren

Institutionen

Themen

Health Literacy and Information AccessibilityText Readability and SimplificationArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Challenges of using generative artificial intelligence for diabetes patient education: a cross-platform analysis of the quality, readability, and actionability of text generated by large language models

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen