OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 14.03.2026, 00:48

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparing the Performance of ChatGPT and GPT-4 versus a Cohort of Medical Students on an Official University of Toronto Undergraduate Medical Education Progress Test

2023·8 ZitationenOpen Access
Volltext beim Verlag öffnen

8

Zitationen

8

Autoren

2023

Jahr

Abstract

A bstract Background Large language model (LLM) based chatbots have recently received broad social uptake; demonstrating remarkable abilities in natural language understanding, natural language generation, dialogue, and logic/reasoning. Objective To compare the performance of two LLM-based chatbots, versus a cohort of medical students, on a University of Toronto undergraduate medical progress test. Methods We report the mean number of correct responses, stratified by year of training/education, for each cohort of undergraduate medical students. We report counts/percentages of correctly answered test questions for each of ChatGPT and GPT-4. We compare the performance of ChatGPT versus GPT-4 using McNemar’s test for dependent proportions. We compare whether the percentage of correctly answered test questions for ChatGPT or GPT-4 fall within/outside the confidence intervals for the mean number of correct responses for each of the cohorts of undergraduate medical education students. Results A total of N=1057 University of Toronto undergraduate medical students completed the progress test during the Fall-2022 and Winter-2023 semesters. Student performance improved with increased training/education levels: UME-Year1 mean=36.3%; UME-Year2 mean=44.1%; UME-Year3 mean=52.2%; UME-Year4 mean=58.5%. ChatGPT answered 68/100 (68.0%) questions correctly; whereas, GPT-4 answered 79/100 (79.0%) questions correctly. GPT-4 performance was statistically significantly greater than ChatGPT (P=0.034). GPT-4 performed at a level equivalent to the top performing undergraduate medical student (79/100 questions correctly answered). Conclusions This study adds to a growing body of literature demonstrating the remarkable performance of LLM-based chatbots on medical tests. GPT-4 performed at a level comparable to the best performing undergraduate medical student who attempted the progress test in 2022/2023. Future work will investigate the potential application of LLM-chatbots as tools for assisting learners/educators in medical education.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationCOVID-19 diagnosis using AIMachine Learning in Healthcare
Volltext beim Verlag öffnen