OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 12.03.2026, 03:51

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparative Evaluation of ChatGPT-4o and Grok-3 on Cleft Lip and Palate and Presurgical Infant Orthopedics: A Multidisciplinary Assessment by Orthodontists, Pediatricians, and Plastic Surgeons

2025·2 Zitationen·The Cleft Palate-Craniofacial Journal
Volltext beim Verlag öffnen

2

Zitationen

5

Autoren

2025

Jahr

Abstract

<b>Objective:</b> This study aimed to evaluate and compare the accuracy, clarity, and clinical applicability of 2 state-of-the-art large language models (LLMs), Chat Generative Pretrained Transformer (ChatGPT)-4o and Grok-3, in generating health information related to cleft lip and palate (CLP) and presurgical infant orthopedics (PSIO). To ensure a multidisciplinary perspective, experts from orthodontics, pediatrics, and plastic surgery independently evaluated the responses. <b>Methods:</b> Six structured questions addressing general and presurgical aspects of CLP were submitted to both ChatGPT-4o and Grok-3. Forty-five blinded specialists (15 from each specialty) assessed the 12 generated responses using 2 validated instruments: the DISCERN tool and the Global Quality Scale (GQS). We conducted interspecialty comparisons to explore variations in model evaluation. <b>Results:</b> We observed no statistically significant differences between ChatGPT-4o and Grok-3 in DISCERN or GQS scores (<i>P</i> > .05). However, pediatricians consistently assigned higher ratings than orthodontists and plastic surgeons in terms of reliability, clarity, and treatment-related content. Patient-directed questions received higher overall scores than those aimed at healthcare professionals. Grok-3 performed slightly better on questions about PSIO, whereas ChatGPT-4o provided more comprehensive and structured answers. <b>Conclusion:</b> Both LLMs demonstrated notable potential in producing readable, informative responses about CLP and PSIO. While they may aid in patient communication and support clinical education, professional oversight remains critical to ensure medical accuracy. The inclusion of Grok-3 in this orthodontic evaluation provides valuable insights and sets the stage for future research on artificial intelligence integration in interdisciplinary cleft care.

Ähnliche Arbeiten