Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Challenges of using generative AI for patient education in chronic heart failure: an evaluation of content quality, readability, and actionability in cross-platform LLM-generated texts
0
Zitationen
4
Autoren
2026
Jahr
Abstract
Objective To compare the differences in content quality, readability, and actionability of patient education texts for self-management of chronic heart failure (CHF) generated by five mainstream large language models (LLMs) in China, and to provide a basis for platform selection and assessment framework construction for clinical use. Methods A standardized set of 20 questions was developed based on literature review, guidelines, and consensus from cardiovascular experts, covering disease awareness, diagnosis and classification, treatment and rehabilitation, daily management and prevention, and psychosocial dimensions. Using a uniform prompt, responses were generated by DeepSeek-R1, Doubao, ERNIEBot 4.5 Turbo, Qwen3-Max-Thinking-Preview, and Kimi K2. The PEMAT-P scale was used to assess understandability and actionability, 36-item expanded EQIP (EQIP-36 score) scale was used to evaluate information completeness and standardization, and Global Quality Score (GQS) was used to assess overall quality. Additionally, seven readability formulas, including Flesch Reading Ease Score (FRES) and Flesch–Kincaid Grade Level (FKGL), were computed for comparison. Results Overall quality was high [GQS median 5.00 (4.00–5.00)] with significant between-platform differences (χ 2 = 14.47, P = 0.006). Doubao and Kimi K2 achieved the highest GQS [both 5.00 (5.00–5.00)]. DeepSeek-R1 showed the greatest information completeness [EQIP-36 39.20 (36.17–44.23); χ 2 = 25.07, P < 0.001] but the lowest readability [FRES 19.32 (17.94–36.89) and FKGL 14.28 (13.02–15.85); both P < 0.001]. ERNIEBot 4.5 Turbo and Qwen3-Max-Thinking-Preview were most readable (FRES ≈ 59; FKGL ≈ 8; both P < 0.001) but had lower EQIP-36 scores. Actionability was limited overall [PEMAT-P actionability 20.00% (0.00–40.00); χ 2 = 26.40, P < 0.001] and varied by topic, with daily management and prevention outperforming disease knowledge and diagnosis/classification (χ 2 = 20.86, P < 0.001). Conclusion LLMs show potential for use in patient education for CHF, but there is a structural trade-off between information detail and readability, as well as gaps in actionability and verifiability. It is recommended to combine enhanced search and structured template generation strategies, and establish a governance feedback loop involving prompt engineering, clinical expert review, and continuous monitoring to improve readability alignment, completeness of action instructions, and patient safety.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.239 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.095 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.463 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.428 Zit.