Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Performance of Artificial Intelligence Chatbot on the Canadian Otolaryngology in-Training Exam: Unlocking Insights on the Intersection of Technology and Education

2025·0 Zitationen·Journal of Otolaryngology - Head and Neck SurgeryOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

ImportanceThe performance of large language models has been compared to that of physicians.ObjectiveTo evaluate the performance of ChatGPT-4 in the field of otolaryngology and head and neck surgery (OTOHNS) residency training.DesignObservational.SettingVirtual.ParticipantsChatGPT-4.InterventionsAll questions from the OTOHNS National In-Training Exam (NITE) for 2022 and 2023 were submitted to ChatGPT-4. Answers were graded by 2 reviewers using the official grading rubric, and the average score was used. Mean exam results from residents who have taken this exam were obtained from the lead faculty.Main Outcome MeasuresZ-tests were used to compare ChatGPT-4's performance to that of residents. The questions were categorized by type (image or text), task, subspecialty, taxonomic level and prompt length.ResultsChatGPT-4 scored 66% (350/529) and 65% (243/374) on the 2022 and 2023 exams, respectively. ChatGPT-4 outperformed the residents on both exams, among all training levels and within all sub-specialties except for the general/pediatrics section of the 2023 exam (Z-test -2.54). For the 2022 exam, ChatGPT-4 would rank in the 99th percentile among post-graduate year (PGY)-2 and 73rd percentile among PGY-4 classmates. For the 2023 exam, it would rank in the 99th percentile among PGY-2 and 71st percentile among PGY-4 classmates. ChatGPT-4 performed best on text-based questions (74%, P < .001) with an effect size of 1.27 (confidence interval (CI): 0.99-1.55), level 1 taxonomic questions (75%, P < .001) with an effect size of 0.084 (CI: 0.03-0.14) and guideline-based questions (70%, P = .048) with an effect size of 0.11 (CI: 0-0.23). It had no significant difference in performance based on subspecialty (P = .36) or prompt length (P = .39).ConclusionsChatGPT-4 not only achieved passing grades on 2 versions of the Canadian OTOHNS NITE, but it also significantly outperformed residents.RelevanceThis study underscores a critical need to redesign residency assessment methods.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationAI in Service InteractionsSurgical Simulation and Training

Volltext beim Verlag öffnen

Performance of Artificial Intelligence Chatbot on the Canadian Otolaryngology in-Training Exam: Unlocking Insights on the Intersection of Technology and Education

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen