Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparison of the ChatGPT and deepseek models in responding to multiple choice questions related to rehabilitation of completely edentulous patients with complete dentures
0
Zitationen
4
Autoren
2026
Jahr
Abstract
Artificial intelligence (AI) chatbots are considered a potential resource for dental education. However, there is still a lack of adequate knowledge regarding the trustworthiness, validity, and utility of the content on these platforms, and its application in dental education is still understudied. A set of Multiple-Choice Questions (MCQs) consisting of 100 questions related to removable complete denture prosthodontics were formulated. Two AI models, DeepSeek-V3 and ChatGPT-4o, were used to deliver the queries. First, accuracy of the generated answers was assessed. Subsequently, two reviewers assessed the LLM (Large Language Model) usefulness and reliability of the responses using modified 5-point Likert scales. McNemar test was done to test the difference between ChatGPT-o4 and DeepSeek-V3 in determining accuracy of scores. The Wilcoxon sign rank test was done to determine the difference between reliability and usefulness between the two AI tools. The Chi square test was used to determine the proportional differences between type of questions and accuracy of scores (α = 0.05). The accuracy of the responses was 59% and 66% for ChatGPT-4o and DeepSeek-V3 respectively. There was no significant difference between two AI tools in delivering accuracy of scores (P = 0.281). There was significant difference between reliability scores exhibited by ChatGPT-4o and DeepSeek-V3 (p = 0.027). Deepseek-V3 exhibited a statistically significant higher overall reliability score. Similarly, significant difference in usefulness scores was exhibited in two AI tools (p = < 0.001). Most of the items in ChatGPT-4o showed inaccuracy of responses in analytical based questions than knowledge-based questions and the difference was statistically significant (P = 0.047). There was no significant difference between ChatGPT-4o and DeepSeek-V3 in delivering accuracy of scores. There are significantly higher reliability and usefulness of responses generated by DeepSeek-V3 compared to ChatGPT-4o. The responses generated by ChatGPT-4o showed greater inaccuracies for analytical-based questions compared to knowledge-based questions.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.231 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.084 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.444 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.423 Zit.