Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluation of accuracy, quality, and readability of information on hypothyroidism provided by different artificial intelligence chatbot models
0
Zitationen
5
Autoren
2025
Jahr
Abstract
Objective: This study assessed the accuracy, quality, and readability of responses from three leading AI chatbots-ChatGPT-3.5, DeepSeek-V3, and Google Gemini-2.5-on the diagnosis, treatment, and long-term risks of adult hypothyroidism, comparing their outputs with current clinical guidelines. Methods: Two thyroid specialists developed 27 questions based on the Guideline for the Diagnosis and Management of Hypothyroidism in Adults (2017 edition), covering three categories: diagnosis, treatment, and long-term health risks. Responses from each AI model were independently evaluated by two reviewers. Accuracy was rated using a six-point Likert scale, quality using the DISCERN tool and the five-point Likert scale, and readability was assessed by the Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index (GFI),and Simple Measure of Gobbledygook(SMOG). Results: All three AI models demonstrated excellent performance in accuracy (mean score > 4.5) and quality (high-quality rate > 94%). According to the DISCERN tool, no significant difference was observed in the overall information quality among the models. However, Gemini-2.5 generated responses of significantly lower quality for treatment-related questions than for diagnostic inquiries. The content generated by all models was relatively difficult to comprehend (low FRE scores and high FKGL/GFI scores), generally requiring a college-level or higher education for adequate understanding. Conclusion: All three AI chatbots were capable of producing highly accurate and high-quality medical information regarding hypothyroidism, with their responses showing strong consistency with clinical guidelines. This underscores the substantial potential of AI in supporting medical information delivery. However, the consistently high reading difficulty of their outputs may limit their practical utility in patient education. Future research should focus on improving the readability and patient-friendliness of AI outputs-through prompt engineering and multi-round dialogue optimization-while maintaining professional accuracy, to enable broader application of AI in health education.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.553 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.444 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.943 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.792 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.