OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 15.03.2026, 18:04

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Challenges of using generative AI for patient education in chronic heart failure: an evaluation of content quality, readability, and actionability in cross-platform LLM-generated texts

2026·0 Zitationen·Frontiers in Public HealthOpen Access
Volltext beim Verlag öffnen

0

Zitationen

4

Autoren

2026

Jahr

Abstract

Objective To compare the differences in content quality, readability, and actionability of patient education texts for self-management of chronic heart failure (CHF) generated by five mainstream large language models (LLMs) in China, and to provide a basis for platform selection and assessment framework construction for clinical use. Methods A standardized set of 20 questions was developed based on literature review, guidelines, and consensus from cardiovascular experts, covering disease awareness, diagnosis and classification, treatment and rehabilitation, daily management and prevention, and psychosocial dimensions. Using a uniform prompt, responses were generated by DeepSeek-R1, Doubao, ERNIEBot 4.5 Turbo, Qwen3-Max-Thinking-Preview, and Kimi K2. The PEMAT-P scale was used to assess understandability and actionability, 36-item expanded EQIP (EQIP-36 score) scale was used to evaluate information completeness and standardization, and Global Quality Score (GQS) was used to assess overall quality. Additionally, seven readability formulas, including Flesch Reading Ease Score (FRES) and Flesch–Kincaid Grade Level (FKGL), were computed for comparison. Results Overall quality was high [GQS median 5.00 (4.00–5.00)] with significant between-platform differences (χ 2 = 14.47, P = 0.006). Doubao and Kimi K2 achieved the highest GQS [both 5.00 (5.00–5.00)]. DeepSeek-R1 showed the greatest information completeness [EQIP-36 39.20 (36.17–44.23); χ 2 = 25.07, P < 0.001] but the lowest readability [FRES 19.32 (17.94–36.89) and FKGL 14.28 (13.02–15.85); both P < 0.001]. ERNIEBot 4.5 Turbo and Qwen3-Max-Thinking-Preview were most readable (FRES ≈ 59; FKGL ≈ 8; both P < 0.001) but had lower EQIP-36 scores. Actionability was limited overall [PEMAT-P actionability 20.00% (0.00–40.00); χ 2 = 26.40, P < 0.001] and varied by topic, with daily management and prevention outperforming disease knowledge and diagnosis/classification (χ 2 = 20.86, P < 0.001). Conclusion LLMs show potential for use in patient education for CHF, but there is a structural trade-off between information detail and readability, as well as gaps in actionability and verifiability. It is recommended to combine enhanced search and structured template generation strategies, and establish a governance feedback loop involving prompt engineering, clinical expert review, and continuous monitoring to improve readability alignment, completeness of action instructions, and patient safety.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationHeart Failure Treatment and ManagementMachine Learning in Healthcare
Volltext beim Verlag öffnen