Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparative performance of ChatGPT, Gemini, and Deepseek on endodontic exam questions in Turkish and English
0
Zitationen
1
Autoren
2026
Jahr
Abstract
Large language model-based artificial intelligence (LLM-based AI) applications have become a focal point in the healthcare field. This study aimed to compare the performance of ChatGPT-4, Gemini 2.0 and DeepSeek-R1 in answering endodontics questions from the dentistry specialty examination in both Turkish and English. A total of 130 multiple-choice Endodontics questions from the dentistry specialty examination question pool were presented to LLMs developed by OpenAI (ChatGPT-4), Google (Gemini 2.0) and DeepSeek (DeepSeek-R1). The questions were entered into each model under standardized conditions in both English and Turkish. The responses and their explanations were classified based on predefined criteria as “correct answer and explanation”, “correct answer with incorrect explanation” and “incorrect”. The R programming language was used within the RStudio environment for statistical analysis. McNemar’s Chi-squared test with continuity correction was applied to analyze the models’ performance in providing correct answers and explanations across different languages, as well as to compare performance between models. Fisher’s Exact Test was used to analyze the models’ responses to different question types. The threshold for statistical significance was set at p < 0.05. When analyzed individually, DeepSeek-R1, Gemini 2.0 and ChatGPT-4 provided correct answers at a higher rate in English compared to Turkish. In Turkish, the performance of DeepSeek-R1 and Gemini 2.0 in providing correct answers and accurate explanations was significantly higher than that of ChatGPT-4. All models demonstrated significantly better performance on Simple-style questions compared to Combination-style questions in both languages. These findings indicate that LLMs show promise in standardized tests within dentistry. However, despite their ability to recognize patterns and organize data, they have limitations in fully understanding the underlying concepts of information. The results also highlight the need for continuous improvements to enhance their effectiveness across different subjects and languages, as well as the potential occurrence of hallucinations in their responses.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.287 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.140 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.534 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.450 Zit.