Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparative Performance of Humans Versus GPT-4.0 and GPT-3.5 in the Self-assessment Program of American Academy of Ophthalmology

2023·4 Zitationen·Research Square (Research Square)Open Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2023

Jahr

Abstract

Abstract To compare the performance of humans, GPT-4.0 and GPT-3.5 in answering multiple-choice questions from the American Academy of Ophthalmology (AAO) Basic and Clinical Science Course (BCSC) self-assessment program, available at https://www.aao.org/education/self-assessments. In June 2023, text-based multiple-choice questions were submitted to GPT-4.0 and GPT-3.5. The AAO provides the percentage of humans who selected the correct answer, which was analyzed for comparison. All questions were classified by 10 subspecialties and 3 practice areas (diagnostics/clinics, medical treatment, surgery). Out of 1023 questions, GPT-4.0 achieved the best score (82.4%), followed by humans (75.7%) and GPT-3.5 (65.9%), with significant difference in accuracy rates (always P < 0.0001). Both GPT-4.0 and GPT-3.5 showed the worst results in surgery-related questions (74.6% and 57.0% respectively). For difficult questions (answered incorrectly by > 50% of humans), both GPT models favorably compared to humans, without reaching significancy. The word count for answers provided by GPT-4.0 was significantly lower than those produced by GPT-3.5 (160 ± 56 and 206 ± 77 respectively, P < 0.0001); however, incorrect responses were longer (P < 0.02). GPT-4.0 represented a substantial improvement over GPT-3.5, achieving better performance than humans in an AAO BCSC self-assessment test. However, ChatGPT is still limited by inconsistency across different practice areas, especially when it comes to surgery.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationCardiac, Anesthesia and Surgical OutcomesClinical Reasoning and Diagnostic Skills

Volltext beim Verlag öffnen

Comparative Performance of Humans Versus GPT-4.0 and GPT-3.5 in the Self-assessment Program of American Academy of Ophthalmology

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen