Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparative performance of ChatGPT, Gemini, and Deepseek on endodontic exam questions in Turkish and English

2026·0 Zitationen·BMC Oral HealthOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Large language model-based artificial intelligence (LLM-based AI) applications have become a focal point in the healthcare field. This study aimed to compare the performance of ChatGPT-4, Gemini 2.0 and DeepSeek-R1 in answering endodontics questions from the dentistry specialty examination in both Turkish and English. A total of 130 multiple-choice Endodontics questions from the dentistry specialty examination question pool were presented to LLMs developed by OpenAI (ChatGPT-4), Google (Gemini 2.0) and DeepSeek (DeepSeek-R1). The questions were entered into each model under standardized conditions in both English and Turkish. The responses and their explanations were classified based on predefined criteria as “correct answer and explanation”, “correct answer with incorrect explanation” and “incorrect”. The R programming language was used within the RStudio environment for statistical analysis. McNemar’s Chi-squared test with continuity correction was applied to analyze the models’ performance in providing correct answers and explanations across different languages, as well as to compare performance between models. Fisher’s Exact Test was used to analyze the models’ responses to different question types. The threshold for statistical significance was set at p < 0.05. When analyzed individually, DeepSeek-R1, Gemini 2.0 and ChatGPT-4 provided correct answers at a higher rate in English compared to Turkish. In Turkish, the performance of DeepSeek-R1 and Gemini 2.0 in providing correct answers and accurate explanations was significantly higher than that of ChatGPT-4. All models demonstrated significantly better performance on Simple-style questions compared to Combination-style questions in both languages. These findings indicate that LLMs show promise in standardized tests within dentistry. However, despite their ability to recognize patterns and organize data, they have limitations in fully understanding the underlying concepts of information. The results also highlight the need for continuous improvements to enhance their effectiveness across different subjects and languages, as well as the potential occurrence of hallucinations in their responses.

Autoren

Eda GÜRSU ŞAHİN

Institutionen

Çankırı Karatekin University(TR)

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic SkillsExplainable Artificial Intelligence (XAI)

Volltext beim Verlag öffnen

Comparative performance of ChatGPT, Gemini, and Deepseek on endodontic exam questions in Turkish and English

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen