Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation
0
Zitationen
2
Autoren
2024
Jahr
Abstract
We follow the topic entitled "Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation 1 ."The purpose of this study was to evaluate ChatGPT-4's performance in answering the 2022 Brazilian National Examination for Medical Degree Revalidation (Revalida) and its potential as a tool for providing feedback on the examination's quality.Two independent physicians entered all examination questions into ChatGPT-4 and compared their responses to the test solutions, determining whether they were adequate, inadequate, or indeterminate.Consensus was used to resolve disagreements.The study also used statistical analysis to evaluate performance across medical themes and to eliminate queries.In the Revalida examination, ChatGPT-4 correctly answered 71 (87.7%) of the questions and mistakenly answered 10 (12.3%).The proportion of correct responses did not change statistically significantly across medical themes.However, in nullified questions, the model had a lower accuracy of 71.4%, and there was no statistical difference between the non-nullified and nullified groups.The reliance on the judgments of only two independent physicians to evaluate the accuracy of ChatGPT-4 is a potential weakness of this study.This raises the likelihood of subjective bias in their evaluations.Furthermore, the study does not provide extensive information on the criteria used to categorize the model's replies as adequate, inadequate, or uncertain, which may impair the evaluation's credibility.Furthermore, the study does not provide extensive information on the criteria used to categorize the model's replies as adequate, inadequate, or uncertain, which may impair the
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.626 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.532 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.046 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.843 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.