Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluation of accuracy, quality, and readability of information on hypothyroidism provided by different artificial intelligence chatbot models

2025·0 Zitationen·Frontiers in Public HealthOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Objective: This study assessed the accuracy, quality, and readability of responses from three leading AI chatbots-ChatGPT-3.5, DeepSeek-V3, and Google Gemini-2.5-on the diagnosis, treatment, and long-term risks of adult hypothyroidism, comparing their outputs with current clinical guidelines. Methods: Two thyroid specialists developed 27 questions based on the Guideline for the Diagnosis and Management of Hypothyroidism in Adults (2017 edition), covering three categories: diagnosis, treatment, and long-term health risks. Responses from each AI model were independently evaluated by two reviewers. Accuracy was rated using a six-point Likert scale, quality using the DISCERN tool and the five-point Likert scale, and readability was assessed by the Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index (GFI),and Simple Measure of Gobbledygook(SMOG). Results: All three AI models demonstrated excellent performance in accuracy (mean score > 4.5) and quality (high-quality rate > 94%). According to the DISCERN tool, no significant difference was observed in the overall information quality among the models. However, Gemini-2.5 generated responses of significantly lower quality for treatment-related questions than for diagnostic inquiries. The content generated by all models was relatively difficult to comprehend (low FRE scores and high FKGL/GFI scores), generally requiring a college-level or higher education for adequate understanding. Conclusion: All three AI chatbots were capable of producing highly accurate and high-quality medical information regarding hypothyroidism, with their responses showing strong consistency with clinical guidelines. This underscores the substantial potential of AI in supporting medical information delivery. However, the consistently high reading difficulty of their outputs may limit their practical utility in patient education. Future research should focus on improving the readability and patient-friendliness of AI outputs-through prompt engineering and multi-round dialogue optimization-while maintaining professional accuracy, to enable broader application of AI in health education.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationAI in Service InteractionsHealth Literacy and Information Accessibility

Volltext beim Verlag öffnen

Evaluation of accuracy, quality, and readability of information on hypothyroidism provided by different artificial intelligence chatbot models

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen