Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating Large Language Models performance in Endodontics: A clinical experimental study
0
Zitationen
8
Autoren
2026
Jahr
Abstract
This study aims to evaluate the diagnostic accuracy, consistency and diagnostic success rates of eight different AI-based chatbots in Endodontics. This cross-sectional study evaluated diagnostic accuracy of eight diverse AI models, selected for architectural/developer heterogeneity and clinical relevance, using 12 validated fictitious endodontic cases aligned with AAE guidelines and ethical approval was waived as no human data were used. STROBE guidelines were followed to ensure methodological rigor. Standardized prompts ensured uniformity, with three independent executions per case to assess consistency. Responses were anonymized and evaluated by blinded, calibrated reviewers and statistical analysis included Kruskal-Wallis, Dunn’s tests, Fleiss’ Kappa, and chi-square to compare diagnostic/treatment accuracy and intramodel agreement. The analysis revealed significant diagnostic accuracy variation among AI models (p < 0.001), with ChatGPT o1 (97%), Claude (97%), and DeepSeek (90.9%) outperforming Gemini (54.5%). Treatment recommendations showed uniformly high accuracy (97–100%, p = 0.537). Multivariate regression confirmed ChatGPT o1 (OR=32.7) and Claude (OR=30.5) as superior, though complex diagnoses (e.g., acute apical abscess, asymptomatic irreversible pulpitis) reduced accuracy (OR=0.01–0.3, p<0.05). Stratified analysis identified model-specific vulnerabilities: Gemini failed in reversible pulpitis (0/3, p=0.001) and chronic apical abscess (0/3, p=0.001), while ChatGPT o1 struggled with acute apical abscess (0/3, p<0.001). Overall agreement was 93%, with high intraclass reliability (ICC >0.85) for top models versus Gemini (ICC=0.65). Fleiss’ Kappa highlighted moderate agreement (κ=0.28–0.45) in ambiguous cases, emphasizing heterogeneous reliability. In conclusion, seven AI chatbots demonstrated high accuracy in endodontics cases, being considered as helpful tools for complement of clinical practice.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.260 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.438 Zit.