OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 11.05.2026, 11:51

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Accuracy and Reliability of Artificial Intelligence Chatbots as Public Information Sources in Implant Dentistry

2025·4 Zitationen·The International Journal of Oral & Maxillofacial Implants
Volltext beim Verlag öffnen

4

Zitationen

4

Autoren

2025

Jahr

Abstract

PURPOSE: To evaluate the accuracy, completeness, comprehensibility, and reliability of widely available artificial intelligence (AI) chatbots when addressing clinically significant queries pertaining to implant dentistry. MATERIALS AND METHODS: A total of 20 questions were devised based on a compiled list that were most frequently asked or encountered during patient consultations by three experienced prosthodontists. These questions were asked to ChatGPT-3.5, Gemini, and Copilot AI chatbots on three separate occasions in 12-day intervals. A three-point Likert scale (0 = incorrect; 1 = incomplete or partially correct; 2 = correct) and a two-point scale (true and false) were employed by the authors to grade the accuracy of the responses independently. Completeness and comprehensibility were also evaluated using a three-point Likert scale (scored 1, 2, or 3, with lower scores indicating a worse result). In addition to the questions generated by the specialists, AI chatbots were asked to provide five frequently asked questions and responses regarding dental implant prostheses. A comparison of total scores obtained from the chatbots was made with one-way analysis of variance (ANOVA). Twopoint scale data were analyzed via chi-square test. The reliability of the responses for each chatbot was analyzed by assessing the consistency of repeated responses via calculating Cronbach's alpha (α) coefficients. RESULTS: When the total accuracy scores of the chatbots were analyzed out of a 40-point total score (ChatGPT-3.5 = 28.78 ± 4.06; Gemini = 30.89 ± 4.08; Copilot = 29.11 ± 3.22), one-way ANOVA revealed no statistically significant differences (P = .461). For the evaluation of two-point accuracy scale data, which were analyzed via chi-square test, no statistical differences were revealed among the chatbots (P = .336). Gemini showed a higher completeness level than ChatGPT-3.5 (P = .011). There was no statistically significant difference among AI chatbots in terms of comprehensibility. Copilot demonstrated the greatest overall consistency among the three chatbots, with a Cronbach's α value of .863. This was followed by ChatGPT-3.5 with a Cronbach's α value of .779 and Gemini with a Cronbach's α value of .636. CONCLUSIONS: The accuracy of all three chatbots were found to be similar. All three chatbots demonstrated an acceptable level of consistency. However, given the low accuracy rate of chatbots in answering questions, it is clear that they should not be relied upon as a decision-maker, and the clinician's opinion must be given priority.

Ähnliche Arbeiten

Autoren

Themen

Artificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen