OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 16.05.2026, 11:03

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Performance of GPT-4 Turbo and GPT-4o in Korean Society of Radiology In-Training Examinations

2025·6 Zitationen·Korean Journal of RadiologyOpen Access
Volltext beim Verlag öffnen

6

Zitationen

6

Autoren

2025

Jahr

Abstract

OBJECTIVE: Despite the potential of large language models for radiology training, their ability to handle image-based radiological questions remains poorly understood. This study aimed to evaluate the performance of the GPT-4 Turbo and GPT-4o in radiology resident examinations, to analyze differences across question types, and to compare their results with those of residents at different levels. MATERIALS AND METHODS: A total of 776 multiple-choice questions from the Korean Society of Radiology In-Training Examinations were used, forming two question sets: one originally written in Korean and the other translated into English. We evaluated the performance of GPT-4 Turbo (gpt-4-turbo-2024-04-09) and GPT-4o (gpt-4o-2024-11-20) on these questions with the temperature set to zero, determining the accuracy based on the majority vote from five independent trials. We analyzed their results using the question type (text-only vs. image-based) and benchmarked them against nationwide radiology residents' performance. The impact of the input language (Korean or English) on model performance was examined. RESULTS: ≥ 0.275). CONCLUSION: GPT-4o outperformed the GPT-4 Turbo in all question types. On image-based questions, both models' performance matched that of 1st-year residents but was lower than that of higher-year residents. Both models demonstrated superior performance compared to residents for text-only questions. The models showed consistent performances across English and Korean inputs.

Ähnliche Arbeiten