Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluation of the Competency of Large Language Models GPT-4o and Claude 3.5 Sonnet in Endodontic Emergencies
0
Zitationen
2
Autoren
2025
Jahr
Abstract
Purpose: This study aimed to evaluate the accuracy and comprehensiveness of the responses generated by GPT-4o and Claude-3.5 Sonnet to the most frequently asked questions about endodontic emergencies. Materials and Methods: The most frequently asked questions about nine different topics (inferior alveolar nerve block, sodium hypochlorite accidents, aspiration of dental materials, separated instruments, perforation, transportation, Ca(OH)2 extrusion, root filling, and flare-up) in endodontics were generated by GPT 3.5. Each question was asked to the both GPT-4o and Claude 3.5 Sonnet. Two authors independently scored the responses. Accuracy and comprehensiveness were assessed for each question using Likert scales. The data were statistically analyzed using the Mann‒Whitney U test, the Kruskal‒Wallis test. Significance level was set at 0.05. Results: Responses generated by both GPT-4o and Claude 3.5 Sonnet to a total of 81 open-ended questions were evaluated. The two models yielded similar results in terms of accuracy and comprehensiveness (p > 0.05). The topics of root filling, perforation, and flare-up have the lowest accuracy scores; and root filling and separated instruments have the lowest comprehensiveness scores for GPT-4o (p < 0.05). The accuracy of Claude 3.5's responses did not show significant differences between the topics (p > 0.05); however, separated instruments had the lowest comprehensiveness scores (p < 0.05). Conclusion: The accuracy and comprehensiveness scores of GPT-4 and Claude 3.5 Sonnet are statistically similar. Despite the high levels of accuracy and comprehensiveness shown by GPT-4o and Claude 3.5 Sonnet, they do not yet have the effect of replacing the operator in endodontic procedures.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.250 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.109 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.482 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.434 Zit.