OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 20.05.2026, 21:25

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Accuracy of Large Language Models in Answering Dental Examination Questions: A Systematic Review and Meta-Analysis

2026·0 Zitationen·International Dental JournalOpen Access
Volltext beim Verlag öffnen

0

Zitationen

10

Autoren

2026

Jahr

Abstract

INTRODUCTION: Large language models (LLMs), including OpenAI's GPT family accessed via interfaces such as ChatGPT and Microsoft Copilot, as well as non-GPT systems such as Google Gemini, are increasingly applied in healthcare and dental education. However, the accuracy of these systems in specialized tasks such as answering dental examination questions remains unclear. METHODS: This systematic review and meta-analysis evaluated LLM performance in answering dental questions. Databases searched were PubMed, Embase, Scopus, and Web of Science. Data on question type and number, LLM versions, and accuracy rates were extracted. Pooled accuracy was estimated using a random-effects model; heterogeneity and publication bias were assessed. RESULTS: A total of 39 studies were included, with ChatGPT-4 being the most frequently evaluated model. The pooled accuracy for LLMs was 63.7% (95% CI: 60.3%-67.1%), with high heterogeneity (I² = 91.5%). Subgroup analysis revealed ChatGPT-4 and Copilot (a GPT-based interface) achieved the highest pooled accuracies (∼73% and ∼75%, respectively). Direct comparisons confirmed ChatGPT-4 significantly outperformed earlier versions and some competitor models. Sensitivity analyses supported the robustness of findings. CONCLUSION: LLMs demonstrate moderate accuracy in answering dental examination questions and are currently insufficient for autonomous clinical decision-making. When their limitations are explicitly recognized, however, these systems may serve as valuable adjuncts in dental education and examination preparation. Methodological strategies such as structured prompting and retrieval-augmented approaches warrant further investigation but were not the primary focus of the present analysis.

Ähnliche Arbeiten