OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 01.05.2026, 10:42

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Large Language Models in Patient Health Communication for Atherosclerotic Cardiovascular Disease: Pilot Cross-Sectional Comparative Analysis

2026·1 Zitationen·JMIR Medical InformaticsOpen Access
Volltext beim Verlag öffnen

1

Zitationen

10

Autoren

2026

Jahr

Abstract

BACKGROUND: Large language models (LLMs) have emerged as promising tools for enhancing public access to medical information, particularly for chronic diseases such as atherosclerotic cardiovascular disease (ASCVD). However, their effectiveness in patient-centered health communication remains underexplored, especially in multilingual contexts. OBJECTIVE: Our study aimed to conduct a comparative evaluation of 3 advanced LLMs-DeepSeek R1, ChatGPT-4o, and Gemini-in generating responses to ASCVD-related patient queries in both English and Chinese, assessing their performance across the domains of accuracy, completeness, and comprehensibility. METHODS: We conducted a cross-sectional evaluation based on 25 clinically validated ASCVD questions spanning 5 domains-definitions, diagnosis, treatment, prevention, and lifestyle. Each question was submitted 5 times to each of the 3 LLMs in both English and Chinese, yielding 750 responses in total, all generated under default settings to approximate real-world conditions. Three board-certified cardiologists blinded to model identity independently scored the responses using standardized Likert scales with predefined anchors. The assessment followed a rigorous multistage process that incorporated randomization, washout periods, and final consensus scoring. RESULTS: DeepSeek R1 achieved the highest "good response" rates (24/25, 96% in both English and Chinese), substantially outperforming ChatGPT-4o (21/25, 84%) and Gemini (12/25, 48% in English and 17/25, 68% in Chinese). DeepSeek R1 demonstrated superior median accuracy scores (6, IQR 6-6 in both languages) and completeness scores (3, IQR 2-3 in both languages) compared to the other models (P<.001). All models had a median comprehensibility score of 3; however, in English, DeepSeek R1 and ChatGPT-4o were rated significantly clearer than Gemini (P=.006 and P=.03, respectively), whereas no significant between-model differences were observed in Chinese (P=.08). Interrater reliability was moderate (Kendall W: accuracy=0.578; completeness=0.565; comprehensibility=0.486). Performance was consistently stronger for definitional and diagnostic questions than for treatment and prevention topics across all models. Specifically, none of the models consistently provided responses aligned with the latest clinical guidelines for the following key guideline-facing question "What is the standard treatment regimen for ASCVD?" CONCLUSIONS: DeepSeek R1 exhibited promising and consistent performance in generating high-quality, patient-facing ASCVD information across both English and Chinese, highlighting the potential of open-source LLMs in promoting digital health literacy and equitable access to chronic disease information. However, a clinically critical weakness was observed in guideline-sensitive treatment: the models did not reliably provide guideline-concordant standard treatment regimens, suggesting that LLM use should be limited to lower-risk informational subqueries (eg, definitions, diagnosis, and lifestyle education) unless augmented by expert oversight and safety controls.

Ähnliche Arbeiten