Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating the Response of AI-Based Large Language Models to Common Patient Concerns About Endodontic Root Canal Treatment: A Comparative Performance Analysis

2025·2 Zitationen·Journal of Clinical MedicineOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Objectives: The aim of this study was to compare the responses of large language models (LLMs)-DeepSeek V3, GPT 5, and Gemini 2.5 Flash-to patients' frequently asked questions (FAQs) regarding root canal treatment in terms of accuracy and comprehensiveness, and to assess the potential roles of these models in patient education and health literacy. Methods: A total of 37 open-ended FAQs, compiled from American Association of Endodontists (AAE) patient education materials and online resources, were presented to three LLMs. Responses were evaluated by expert clinicians on a 5-point Likert scale for accuracy and comprehensiveness. Inter-rater and test-retest reliability were assessed using intraclass correlation coefficients (ICCs). Differences among models were analyzed with the Kruskal-Wallis H test, followed by pairwise Mann-Whitney U tests with effect sizes (Cliff's delta, δ). A p-value < 0.05 was considered statistically significant. Results: Inter-rater agreement was excellent, with ICCs of 0.92 for accuracy and 0.91 for comprehensiveness. Test-retest reliability also demonstrated high consistency (ICCs of 0.90 for accuracy and 0.89 for comprehensiveness). DeepSeek V3 achieved the highest scores, with a mean accuracy of 4.81 ± 0.39 and a mean comprehensiveness of 4.78 ± 0.41, demonstrating statistically superior performance compared to GPT 5 (accuracy 4.0 ± 0.0; comprehensiveness 4.05 ± 0.4; p < 0.05, δ = 0.81 for accuracy, δ = 0.69 for comprehensiveness) and Gemini 2.5 Flash (accuracy 3.83 ± 0.68; comprehensiveness 3.81 ± 0.7; p < 0.05, δ = 0.71 for accuracy, δ = 0.70 for comprehensiveness). No significant difference was observed between GPT 5 and Gemini 2.5 Flash for either accuracy (p = 0.109, δ = 0.16) or comprehensiveness (p = 0.058, δ = 0.21). Conclusions: LLMs, such as DeepSeek V3, which can provide satisfactory responses to FAQs may serve as valuable supportive tools in patient education and health literacy; however, expert clinician oversight remains essential in clinical decision-making and treatment planning. When used appropriately, LLMs can enhance patient awareness and support satisfaction throughout the root canal treatment.

Autoren

Institutionen

Bülent Ecevit University(TR)

Themen

Dental Radiography and ImagingArtificial Intelligence in Healthcare and EducationDental Research and COVID-19

Volltext beim Verlag öffnen

Evaluating the Response of AI-Based Large Language Models to Common Patient Concerns About Endodontic Root Canal Treatment: A Comparative Performance Analysis

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen