Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Reliability and Performance of Four Large Language Models in Orthodontic Knowledge Assessment

2025·0 Zitationen·Journal of Dental Education

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Artificial intelligence-based large language models (LLMs) are gaining prominence as educational tools. This study evaluated the accuracy and reliability of four popular publicly available LLM models-ChatGPT 4.0, ChatGPT 4o, Google Gemini, and Microsoft CoPilot-in answering orthodontic questions from the National Board of Dental Examiners examinations. Each model was tested across three trials to assess response consistency. Reliability was analyzed using Cohen's and Fleiss' Kappa. Among the four tested models, Microsoft CoPilot demonstrated the highest reliability, while ChatGPT-4.0 had the highest accuracy. Variability across trials suggests that AI-generated responses remain inconsistent. The variable responses generated over time by LLMs limit their standalone applicability in orthodontic education. Older models at times outperformed newer models. AI model updates do not necessarily lead to improved reliability. Although AI models may show potential as supplementary study aids, their accuracy and stability require further refinement before being deployed in educational contexts.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationDental Research and COVID-19Dental Radiography and Imaging

Volltext beim Verlag öffnen

Reliability and Performance of Four Large Language Models in Orthodontic Knowledge Assessment

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen