Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparison of the ChatGPT and deepseek models in responding to multiple choice questions related to rehabilitation of completely edentulous patients with complete dentures

2026·0 Zitationen·BMC Oral HealthOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Artificial intelligence (AI) chatbots are considered a potential resource for dental education. However, there is still a lack of adequate knowledge regarding the trustworthiness, validity, and utility of the content on these platforms, and its application in dental education is still understudied. A set of Multiple-Choice Questions (MCQs) consisting of 100 questions related to removable complete denture prosthodontics were formulated. Two AI models, DeepSeek-V3 and ChatGPT-4o, were used to deliver the queries. First, accuracy of the generated answers was assessed. Subsequently, two reviewers assessed the LLM (Large Language Model) usefulness and reliability of the responses using modified 5-point Likert scales. McNemar test was done to test the difference between ChatGPT-o4 and DeepSeek-V3 in determining accuracy of scores. The Wilcoxon sign rank test was done to determine the difference between reliability and usefulness between the two AI tools. The Chi square test was used to determine the proportional differences between type of questions and accuracy of scores (α = 0.05). The accuracy of the responses was 59% and 66% for ChatGPT-4o and DeepSeek-V3 respectively. There was no significant difference between two AI tools in delivering accuracy of scores (P = 0.281). There was significant difference between reliability scores exhibited by ChatGPT-4o and DeepSeek-V3 (p = 0.027). Deepseek-V3 exhibited a statistically significant higher overall reliability score. Similarly, significant difference in usefulness scores was exhibited in two AI tools (p = < 0.001). Most of the items in ChatGPT-4o showed inaccuracy of responses in analytical based questions than knowledge-based questions and the difference was statistically significant (P = 0.047). There was no significant difference between ChatGPT-4o and DeepSeek-V3 in delivering accuracy of scores. There are significantly higher reliability and usefulness of responses generated by DeepSeek-V3 compared to ChatGPT-4o. The responses generated by ChatGPT-4o showed greater inaccuracies for analytical-based questions compared to knowledge-based questions.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationAI in Service InteractionsDigital Mental Health Interventions

Volltext beim Verlag öffnen

Comparison of the ChatGPT and deepseek models in responding to multiple choice questions related to rehabilitation of completely edentulous patients with complete dentures

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen