Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Effectiveness of Large Language Models in Stroke Rehabilitation Health Education: A Comparative Study of ChatGPT-4, MedGo, Qwen, and ERNIE Bot (Preprint)
0
Zitationen
11
Autoren
2025
Jahr
Abstract
<sec> <title>BACKGROUND</title> Stroke is a leading cause of disability and death worldwide, with home-based rehabilitation playing a crucial role in improving patient prognosis and quality of life. Traditional health education models often fall short in terms of precision, personalization, and accessibility. In contrast, large language models (LLMs) are gaining attention for their potential in medical health education, owing to their advanced natural language processing capabilities. However, the effectiveness of LLMs in home-based stroke rehabilitation remains uncertain. </sec> <sec> <title>OBJECTIVE</title> This study evaluates the effectiveness of four LLMs—ChatGPT-4, MedGo, Qwen, and ERNIE Bot—in home-based stroke rehabilitation. The aim is to offer stroke patients more precise and secure health education pathways while exploring the feasibility of using LLMs to guide health education. </sec> <sec> <title>METHODS</title> In the first phase of this study, a literature review and expert interviews identified 15 common questions and 2 clinical cases relevant to stroke patients in home-based rehabilitation. These were input into four LLMs for simulated consultations. Six medical experts (2 clinicians, 2 nursing specialists, and 2 rehabilitation therapists) evaluated the LLM-generated responses using a Likert 5-point scale, assessing accuracy, completeness, readability, safety, and humanity. In the second phase, the top two performing models from phase one were selected. Thirty stroke patients undergoing home-based rehabilitation were recruited. Each patient asked both models 3 questions, rated the responses using a satisfaction scale, and assessed readability, text length, and recommended reading age using a Chinese readability analysis tool. Data were analyzed using one-way ANOVA, post hoc Tukey HSD tests, and paired t-tests. </sec> <sec> <title>RESULTS</title> The results revealed significant differences across the four models in five dimensions: accuracy (P = .002), completeness (P < .001), readability (P = .04), safety (P = .007), and humanity (P < .001). ChatGPT-4 outperformed all models in each dimension, with scores for accuracy (M = 4.28, SD = 0.84), completeness (M = 4.35, SD = 0.75), readability (M = 4.28, SD = 0.85), safety (M = 4.38, SD = 0.81), and user-friendliness (M = 4.65, SD = 0.66). MedGo excelled in accuracy (M = 4.06, SD = 0.78) and completeness (M = 4.06, SD = 0.74). Qwen and ERNIE Bot scored significantly lower across all five dimensions compared to ChatGPT-4 and MedGo. ChatGPT-4 generated the longest responses (M = 1338.35, SD = 236.03) and had the highest readability score (M = 12.88). In the second phase, ChatGPT-4 performed the best overall, while MedGo provided the clearest responses. </sec> <sec> <title>CONCLUSIONS</title> LLMs have shown strong performance in home-based stroke rehabilitation education, demonstrating significant potential for real-world applications. However, further improvements are needed in accuracy, professionalism, and oversight. </sec>
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.260 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.438 Zit.