OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 18.03.2026, 17:35

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Reliability, quality, and readability of AI-generated vocal health and hygiene information: a cross-sectional comparative study

2026·0 Zitationen·The Egyptian Journal of OtolaryngologyOpen Access
Volltext beim Verlag öffnen

0

Zitationen

3

Autoren

2026

Jahr

Abstract

Abstract Objectives To compare the reliability, quality, and readability of artificial intelligence (AI)–generated patient information on vocal health and hygiene produced by three general-purpose large language models (LLMs): ChatGPT-5 (OpenAI), Grok-4 (xAI), and Claude System-4 (Anthropic). Methods Twenty-five frequently asked questions related to vocal health and hygiene were developed using established otolaryngology guidelines, major patient education resources, and clinical experience. Each question was submitted verbatim to all three LLMs under standardized prompting conditions. Responses were assessed for reliability (mDISCERN), information quality (EQIP), and readability (FRES, FKGL). Two otolaryngologists independently rated all outputs while blinded to the identity of the models. Inter-rater agreement was assessed using intraclass correlation coefficients (ICC) for EQIP and Cohen’s κ for mDISCERN. Results Claude demonstrated significantly higher mDISCERN and EQIP scores than ChatGPT and Grok ( p < 0.001), indicating higher reliability and information-quality scores within the evaluated FAQ set. Readability indices did not differ significantly among models (FRES, p = 0.510; FKGL, p = 0.590), and all outputs corresponded to the upper high school to college-level range (mean FKGL ≈ 12; low FRES values in the ‘difficult’ range), exceeding recommended health literacy thresholds. Subgroup analyses showed similar patterns across thematic domains. A strong positive correlation was observed between mDISCERN and EQIP ( ρ = 0.72, p < 0.001). Conclusion Under standardized prompting conditions, AI chatbots generated structured and internally consistent patient information text; however, the elevated reading level limits accessibility for the general public. Among the evaluated LLMs, Claude produced higher reliability and quality scores within this structured question set. These systems may serve as supplementary educational resources; however, professional clinical evaluation and individualized patient assessment remain essential.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Voice and Speech DisordersArtificial Intelligence in Healthcare and EducationPhonocardiography and Auscultation Techniques
Volltext beim Verlag öffnen