Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
How Accurate and Consistent Are Large Language Models in Restorative Dentistry Questions? A Cross-Sectional Test-Retest Study
0
Zitationen
2
Autoren
2026
Jahr
Abstract
<title>Abstract</title> Background In recent years, large language models (LLMs) have emerged as a notable breakthrough in the field of artificial intelligence. The aim of this study is to compare the accuracy levels of different LLMs on multiple-choice questions (MCQs) related to the field of restorative dentistry in the dental specialisation exam (DUS) administered in Turkey and to evaluate response consistency (test–retest reliability) between two sessions. Methods In this study, 127 text-based MCQs related to restorative dentistry, without visual material, were used in the DUS. The responses from the ChatGPT-5.1, Gemini 2.5 Pro, Microsoft Copilot, and DeepSeek-v3.2 models were evaluated at two different time points (T1 and T2) and coded as correct/incorrect according to the official answer key. The accuracy of the models was analysed using Cochran's Q test, and the inter-session variation was analysed using the McNemar test. Test–retest response reliability was assessed using Cohen's Kappa coefficient and percentage agreement rates. Results ChatGPT-5.1 achieved the highest accuracy rate in both sessions, while DeepSeek-v3.2 demonstrated relatively lower accuracy performance. However, no statistically significant difference was found between the models' T1 and T2 accuracy rates. No significant performance difference was identified between the models in subcategory-based analyses either. Test–retest analyses revealed that, despite high accuracy rates, response stability could vary depending on the model, and Cohen's Kappa values ranged from low to moderate levels. Conclusions It is thought that LLMs can answer questions about theoretical knowledge in the field of restorative dentistry with high accuracy, but may show limitations in terms of time-dependent response consistency. These findings suggest that while LLMs have potential as supportive tools in dental education, their use without human oversight requires careful consideration.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.291 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.143 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.535 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.452 Zit.