Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Artificial Intelligence Chatbots and Temporomandibular Disorders: A Comparative Content Analysis over One Year
0
Zitationen
7
Autoren
2025
Jahr
Abstract
As the use of artificial intelligence (AI) chatbots for medical queries expands, their reliability may vary as models evolve. We longitudinally assessed the quality, reliability, and readability of information on temporomandibular disorders (TMD) generated by three widely used chatbots (ChatGPT, Gemini, and Microsoft Copilot). Ten TMD questions were submitted to each chatbot at two timepoints (T1: February 2024; T2: February 2025). Two blinded evaluators independently assessed all answers using validated tools like the Global Quality Score (GQS), PEMAT, DISCERN, CLEAR, Flesch Reading Ease (FRE), and Flesch–Kincaid Grade Level (FKGL) tools. Analyses followed METRICS guidance. Comparisons between models and across timepoints were conducted using non-parametric tests. At T1, Copilot scored significantly lower in GQS, CLEAR appropriateness, and relevance (p < 0.01), while ChatGPT provided less evidence-based content than its counterparts (p < 0.001). Reliability was poor across models (mean DISCERN score: 34.73 ± 9.49), and readability was difficult (mean FRE: 34.64; FKGL: 14.13). At T2, performances improved across chatbots, particularly for Copilot, yet actionability remained limited and citations were inconsistent. This year-long longitudinal analysis shows an overall improvement in chatbot performance, although concerns regarding information reliability persist. These findings underscore the importance of human oversight of AI-mediated patient information, reaffirming that clinicians should remain the primary source of patient education.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.400 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.261 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.695 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.506 Zit.