OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 16.05.2026, 01:53

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating large language models using national endodontic specialty examination questions: are they ready for real-world dentistry?

2025·10 Zitationen·BMC Medical EducationOpen Access
Volltext beim Verlag öffnen

10

Zitationen

2

Autoren

2025

Jahr

Abstract

BACKGROUND: Large Language Models (LLMs) are artificial intelligence (AI) systems that simulate human language processing through deep learning techniques and neural networks. They are increasingly utilized for clinical decision support, student training, and enhancing educational processes. However, the reliability of AI models, especially in answering various types of questions, remains a point of debate. Standard multiple-choice questions (MCQs) involve selecting one correct answer from five options, whereas combination-type MCQs (C-MCQs) identify all correct statements among several alternatives. This study aims to evaluate and compare the performance of various LLMs in answering MCQs and C-MCQs in endodontics. METHODS: A total of 151 endodontic questions were identified through a comprehensive review of publicly available Dentistry Specialty Exams in Turkey conducted since 2012. The questions were presented to eight LLMs (ChatGPT-4o, ChatGPT-4, Gemini 1.5 Flash, Gemini 1.5 Pro, Gemini 2.0 Flash, Copilot, Deepseek-V3, and Qwen2.5-Max) in Turkish. Accuracy rates for both MCQs and C-MCQs were statistically analyzed using SPSS v23 (p < 0.05). RESULTS: ChatGPT-4o achieved the highest overall accuracy rate (81.5%), while Gemini 1.5 Flash had the lowest (57%). In standard MCQs, ChatGPT-4o significantly outperformed the other models (p < 0.001), but in C-MCQs, no significant difference was observed between the models (p = 0.179). Across all models, accuracy rates for C-MCQs were significantly lower than for MCQs (p < 0.05). Deepseek-V3 maintained a more balanced performance across question types than the other models. CONCLUSIONS: LLMs show promising potential as educational tools in endodontics. However, their accuracy varies by question type and model. They can support student learning and clinical decision-making but cannot yet be considered a fully reliable standalone source in endodontics.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationDental Research and COVID-19Dental Radiography and Imaging
Volltext beim Verlag öffnen