Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

The impact of language differences on the readability, quality, and reliability of information provided by artificial intelligence chatbots regarding vital pulp therapy: a cross-sectional study

2025·1 Zitationen·BMC Oral HealthOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

The increasing use of artificial intelligence (AI) chatbots in healthcare has highlighted the need to evaluate the accuracy, reliability, and readability of the clinical information they provide. Vital pulp therapy is one of the fundamental biological approaches in modern dentistry aimed at preserving pulp vitality, and the quality of information related to this topic is highly important for clinical decision-making. The present study aimed to assess whether the readability, quality, and reliability of information provided by six different AI-based chatbots (ChatGPT, ChatGPT-4o, Gemini, Microsoft Copilot, Perplexity, and Claude) regarding vital pulp therapy vary depending on language differences. After a comprehensive literature review, 12 questions related to vital pulp therapy were developed. Each question was submitted to the chatbots in both Turkish and English for 7 consecutive days. The responses obtained in both languages were evaluated for readability using the Flesch Reading Ease Score (FRES) for English and the Ateşman Readability Formula for Turkish. Information quality was assessed using the Global Quality Scale (GQS), while reliability was evaluated based on the Journal of the American Medical Association (JAMA) benchmarks. Statistical analyses were performed using ANOVA, Bonferroni, and Chi-square tests, with a significance level set at p < 0.05. The findings demonstrated significant language-based differences among the evaluated models. Readability, GQS, and JAMA assessments revealed statistically significant differences between the chatbots in both English and Turkish responses (p < 0.05). In the GQS evaluation, Gemini achieved the highest quality scores in English, while ChatGPT, ChatGPT-4o, and Claude produced the highest scores in Turkish (p < 0.05). In terms of JAMA reliability, Gemini and Perplexity showed the highest performance in English responses, whereas Perplexity demonstrated significantly higher reliability than the other platforms in Turkish (p < 0.05). Regarding readability, Perplexity generated the most difficult-to-read content in both languages, whereas ChatGPT provided the most readable responses (p < 0.05). The 7-day assessment showed no significant day-to-day changes in readability, quality, or reliability scores for most chatbots, indicating that their performance remained largely stable over time (p > 0.05). This study demonstrated that language differences have a significant impact on the readability, quality, and reliability of information provided by AI–based chatbots regarding vital pulp therapy. These findings suggest that such systems may serve as supportive tools for accessing clinical information; however, expert oversight remains essential to ensure the accuracy and quality of the content. Future studies should include a wider variety of languages and chatbot models, along with extended evaluation periods.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationDental Research and COVID-19Electronic Health Records Systems

Volltext beim Verlag öffnen

The impact of language differences on the readability, quality, and reliability of information provided by artificial intelligence chatbots regarding vital pulp therapy: a cross-sectional study

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen