OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 23.03.2026, 20:28

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating ChatGPT-4 in Otolaryngology–Head and Neck Surgery Board Examination using the CVSA Model

2023·16 ZitationenOpen Access
Volltext beim Verlag öffnen

16

Zitationen

8

Autoren

2023

Jahr

Abstract

Abstract Background ChatGPT is among the most popular Large Language Models (LLM), exhibiting proficiency in various standardized tests, including multiple-choice medical board examinations. However, its performance on Otolaryngology–Head and Neck Surgery (OHNS) board exams and open-ended medical board examinations has not been reported. We present the first evaluation of LLM (ChatGPT-4) on such examinations and propose a novel method to assess an artificial intelligence (AI) model’s performance on open-ended medical board examination questions. Methods Twenty-one open end questions were adopted from the Royal College of Physicians and Surgeons of Canada’s sample exam to query ChatGPT-4 on April 11th, 2023, with and without prompts. A new CVSA (concordance, validity, safety, and accuracy) model was developed to evaluate its performance. Results In an open-ended question assessment, ChatGPT-4 achieved a passing mark (an average of 75% across three trials) in the attempts. The model demonstrated high concordance (92.06%) and satisfactory validity. While demonstrating considerable consistency in regenerating answers, it often provided only partially correct responses. Notably, concerning features such as hallucinations and self-conflicting answers were observed. Conclusion ChatGPT-4 achieved a passing score in the sample exam, and demonstrated the potential to pass the Canadian Otolaryngology–Head and Neck Surgery Royal College board examination. Some concerns remain due to its hallucinations that could pose risks to patient safety. Further adjustments are necessary to yield safer and more accurate answers for clinical implementation.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationCardiac, Anesthesia and Surgical OutcomesRadiomics and Machine Learning in Medical Imaging
Volltext beim Verlag öffnen