Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluation of large Language models on pediatric asthma: a comparative study of Claude3-Opus, Gemini 2.0, ChatGPT-4o, and DeepSeek—a cross-sectional questionnaire study

2026·0 Zitationen·BMC Medical Informatics and Decision MakingOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Artificial intelligence (AI) has shown potential for enhancing medical practice and improving patient outcomes. However, the efficacy and linguistic accessibility of Large Language Models(LLMs) in pediatric asthma management remain underexplored. This study evaluated the performance of four LLMs in generating clinical information within this domains. We administrated 15 guideline-based pediatric asthma inquiries to hatGPT-4o, Claude 3 Opus, Gemini 2.0, and DeepSeek. Anonymized responses were independently evaluated by three board-certified pediatric pulmonologists using DISCERN instrument (score range 16–80). Readability was assessed using six standard indices. Inter-rater reliability was measured with intraclass correlation coefficients (ICC). Statistical analysis included repeated measures and post-hoc comparisons with effect size reporting. No significant difference was found in the overall quality of health information (DISCERN scores) among the four LLMs (F(3,56) = 0.144, p =.933, η² =0.008), with all mean scores clustered within a narrow “fair-to-good” range (50.3–51.9). However, significant differences were observed in readability: ChatGPT-4o generated significantly more comprehensible text than DeepSeek (FRE mean difference = 12.41, p =.005, Cohen’s d = 1.28), while DeepSeek performed significantly worse than all other models (all p <.05). Inter-rater reliability was high (ICC range: 0.849–0.901, all p <.001). Critically, the mean readability level of all outputs (FKGL: 13.2–14.9) far exceeded the recommended reading accessibility level for patient materials. While current LLMs can provide generally accurate information on pediatric asthma, their outputs exhibit significant limitations in readability for patient-facing use. ChatGPT‑4o shows relative advantages in comprehensibility, yet none meet recommended health-literacy standards. These findings underscore that AI should serve as a supplementary decision‑support tool under clinician supervision, not as a substitute for professional medical advice. Future work should prioritize the integration of adaptive text‑simplification features, validate AI‑generated content in real‑world clinical and caregiver settings, and expand evaluations to include emerging models and diverse chronic disease contexts.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationText Readability and SimplificationHealth Literacy and Information Accessibility

Volltext beim Verlag öffnen

Evaluation of large Language models on pediatric asthma: a comparative study of Claude3-Opus, Gemini 2.0, ChatGPT-4o, and DeepSeek—a cross-sectional questionnaire study

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen