Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A Comparative Assessment of Large Language Models in Congenital Hypothyroidism: Reliability, Quality and Readability
0
Zitationen
2
Autoren
2026
Jahr
Abstract
Objective: To comparatively evaluate the reliability, quality, and readability of responses generated by widely used large language model (LLM)-based chatbots to congenital hypothyroidism (CH)-related patient questions. Methods: Forty CH frequently asked questions (FAQs), derived from clinician-reviewed patient education resources, were submitted under standardized conditions (December 2025) to ChatGPT-4, ChatGPT-5.2, Gemini, and Copilot. The modified DISCERN (mDISCERN) instrument was used to assess reliability, whereas the Global Quality Score (GQS) was used to evaluate quality. Readability was evaluated using Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index (GFI), Coleman-Liau Index (CLI), and Simple Measure of Gobbledygook (SMOG). Scores were compared using Friedman tests with Bonferroni-corrected post hoc analyses. Results: Median mDISCERN scores were 5.0 for ChatGPT-4, ChatGPT-5.2, and Gemini, and 4.0 for Copilot. Median GQS scores were 5.0 for ChatGPT-4, ChatGPT-5.2, and Gemini, and 4.0 for Copilot. Differences among models were significant for both mDISCERN and GQS (p<0.001), with ChatGPT-5.2 outperforming others in key pairwise comparisons. Readability differed significantly across all indices (all p<0.001). ChatGPT-5.2 demonstrated the highest FRE and lowest FKGL, whereas Gemini produced the most complex text. However, all models exceeded the recommended sixth-grade reading level. Conclusion: LLM-based chatbots generated generally moderate-to-high quality CH information, but readability remains suboptimal for patient education. ChatGPT-5.2 showed the best overall performance. LLM outputs may support patient information needs but should complement, not replace, clinician-provided counseling.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.644 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.550 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.061 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.850 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.