OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 09.05.2026, 18:36

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation

2024·0 Zitationen·Revista da Associação Médica BrasileiraOpen Access
Volltext beim Verlag öffnen

0

Zitationen

2

Autoren

2024

Jahr

Abstract

We follow the topic entitled "Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation 1 ."The purpose of this study was to evaluate ChatGPT-4's performance in answering the 2022 Brazilian National Examination for Medical Degree Revalidation (Revalida) and its potential as a tool for providing feedback on the examination's quality.Two independent physicians entered all examination questions into ChatGPT-4 and compared their responses to the test solutions, determining whether they were adequate, inadequate, or indeterminate.Consensus was used to resolve disagreements.The study also used statistical analysis to evaluate performance across medical themes and to eliminate queries.In the Revalida examination, ChatGPT-4 correctly answered 71 (87.7%) of the questions and mistakenly answered 10 (12.3%).The proportion of correct responses did not change statistically significantly across medical themes.However, in nullified questions, the model had a lower accuracy of 71.4%, and there was no statistical difference between the non-nullified and nullified groups.The reliance on the judgments of only two independent physicians to evaluate the accuracy of ChatGPT-4 is a potential weakness of this study.This raises the likelihood of subjective bias in their evaluations.Furthermore, the study does not provide extensive information on the criteria used to categorize the model's replies as adequate, inadequate, or uncertain, which may impair the evaluation's credibility.Furthermore, the study does not provide extensive information on the criteria used to categorize the model's replies as adequate, inadequate, or uncertain, which may impair the

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic Skills
Volltext beim Verlag öffnen