OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 20.03.2026, 05:21

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Performance and Errors of ChatGPT-4o on the Japanese Medical Licensing Examination: Solving All Questions Including Images with Over 90% Accuracy (Preprint)

2024·5 ZitationenOpen Access
Volltext beim Verlag öffnen

5

Zitationen

8

Autoren

2024

Jahr

Abstract

<sec> <title>BACKGROUND</title> Recent advancements in AI technology have begun to play a crucial role in medical education. AI models, such as ChatGPT, have shown promise in various applications, including answering medical questions and assisting in clinical decision-making. However, there is limited research on the performance of these models on comprehensive medical licensing exams. </sec> <sec> <title>OBJECTIVE</title> This study aims to evaluate the performance of ChatGPT-4o on the 118th Japanese Medical Licensing Examination (JMLE), specifically assessing its ability to handle both text and image-based questions, and to analyze the types of errors it makes. </sec> <sec> <title>METHODS</title> ChatGPT-4o was utilized to complete all 400 questions of the 118th JMLE held in February 2024. The model, updated with data up to May 13, 2023, was assessed on its ability to answer both text-only and image-based questions. Questions were directly input into the chat interface without the use of prompt engineering or memory functions. Due to the daily response limit of ChatGPT-4o, the study was conducted from May 13 to May 19, 2024. An independent samples t-test compared the correct response rates between image-based and text-only questions. Statistical significance was set at ????&lt;.05 for all two-tailed tests. </sec> <sec> <title>RESULTS</title> ChatGPT-4o achieved an overall correct response rate of 93.25%, with 93.48% for image-based and 93.18% for text-only questions. The difference in correct response rates between text-only and image-based questions was not statistically significant (t-value: -0.074, p-value: 0.941). The errors were classified into four categories: diagnostic errors, logical errors, medical knowledge errors, and reading comprehension errors. Discussion ChatGPT-4o demonstrated high proficiency in both text-centric and image-based questions, marking a significant improvement over previous iterations of GPT models. This performance meets the passing criteria set by the Ministry of Health, Labor, and Welfare for the JNMLE, which requires a total score of at least 160/200 points for compulsory questions, at least 230/300 points for non-compulsory questions, and no more than 3 incorrect choices in critical exclusion questions. Although ChatGPT-4o met the overall passing criteria, some responses indicated potentially problematic clinical judgments, such as incorrect triage decisions and prioritization errors in clinical scenarios. These findings underscore the need for improved clinical judgment capabilities in AI models. </sec> <sec> <title>CONCLUSIONS</title> ChatGPT-4o successfully met the passing criteria for the 118th JNMLE, demonstrating high proficiency in handling both text and image-based questions. This marks a significant improvement over previous iterations of GPT models, particularly in managing multimodal tasks. The model excelled in answering specific medical knowledge questions, indicating a strong grasp of medical facts and concepts. However, it struggled with clinical judgment and prioritization, as evidenced by errors in triage decisions and the selection of appropriate diagnostic procedures. These findings highlight the need for continued enhancement of AI models to ensure their reliability and accuracy in clinical decision-making. While generative AI like ChatGPT-4o shows great potential, understanding and addressing its limitations will be critical for its effective integration into medical education and practice. </sec>

Ähnliche Arbeiten

Autoren

Themen

Artificial Intelligence in Healthcare and EducationRadiomics and Machine Learning in Medical Imaging
Volltext beim Verlag öffnen