Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
The impact of language differences on the readability, quality, and reliability of information provided by artificial intelligence chatbots regarding vital pulp therapy: a cross-sectional study
1
Zitationen
2
Autoren
2025
Jahr
Abstract
The increasing use of artificial intelligence (AI) chatbots in healthcare has highlighted the need to evaluate the accuracy, reliability, and readability of the clinical information they provide. Vital pulp therapy is one of the fundamental biological approaches in modern dentistry aimed at preserving pulp vitality, and the quality of information related to this topic is highly important for clinical decision-making. The present study aimed to assess whether the readability, quality, and reliability of information provided by six different AI-based chatbots (ChatGPT, ChatGPT-4o, Gemini, Microsoft Copilot, Perplexity, and Claude) regarding vital pulp therapy vary depending on language differences. After a comprehensive literature review, 12 questions related to vital pulp therapy were developed. Each question was submitted to the chatbots in both Turkish and English for 7 consecutive days. The responses obtained in both languages were evaluated for readability using the Flesch Reading Ease Score (FRES) for English and the Ateşman Readability Formula for Turkish. Information quality was assessed using the Global Quality Scale (GQS), while reliability was evaluated based on the Journal of the American Medical Association (JAMA) benchmarks. Statistical analyses were performed using ANOVA, Bonferroni, and Chi-square tests, with a significance level set at p < 0.05. The findings demonstrated significant language-based differences among the evaluated models. Readability, GQS, and JAMA assessments revealed statistically significant differences between the chatbots in both English and Turkish responses (p < 0.05). In the GQS evaluation, Gemini achieved the highest quality scores in English, while ChatGPT, ChatGPT-4o, and Claude produced the highest scores in Turkish (p < 0.05). In terms of JAMA reliability, Gemini and Perplexity showed the highest performance in English responses, whereas Perplexity demonstrated significantly higher reliability than the other platforms in Turkish (p < 0.05). Regarding readability, Perplexity generated the most difficult-to-read content in both languages, whereas ChatGPT provided the most readable responses (p < 0.05). The 7-day assessment showed no significant day-to-day changes in readability, quality, or reliability scores for most chatbots, indicating that their performance remained largely stable over time (p > 0.05). This study demonstrated that language differences have a significant impact on the readability, quality, and reliability of information provided by AI–based chatbots regarding vital pulp therapy. These findings suggest that such systems may serve as supportive tools for accessing clinical information; however, expert oversight remains essential to ensure the accuracy and quality of the content. Future studies should include a wider variety of languages and chatbot models, along with extended evaluation periods.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.697 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.602 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.127 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.872 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.