OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 21.03.2026, 10:42

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Performance of GPT-4 on the American College of Radiology In-Service Examination

2024·2 ZitationenOpen Access
Volltext beim Verlag öffnen

2

Zitationen

9

Autoren

2024

Jahr

Abstract

ABSTRACT Objectives No study has evaluated the ability of ChatGPT-4 to answer image-rich diagnostic radiology board exam questions or assessed for model drift in GPT-4’s image interpretation abilities. In our study we evaluate GPT-4’s performance on the American College of Radiology (ACR) 2022 Diagnostic Radiology In-Training Examination (DXIT). Methods Questions were sequentially input into GPT-4 with a standardized prompt. Each answer was recorded and overall accuracy was calculated, as was logic-adjusted accuracy, and accuracy on image-based questions. This experiment was repeated several months later to assess for model drift. Results GPT-4 achieved 58.5% overall accuracy, lower than the PGY-3 average (61.9%) but higher than the PGY-2 average (52.8%). Adjusted accuracy was 52.8%. GPT-4 showed significantly higher (p = 0.012) confidence for correct answers (87.1%) compared to incorrect (84.0%). Performance on image-based questions was notably poorer (p < 0.001) at 45.4% compared to text-only questions (80.0%), with adjusted accuracy for image questions of 36.4%. When the questions were repeated, GPT-4 chose a different answer 25.5% of the time and there was a small but insignificant decrease in accuracy. Discussion GPT-4 performed between PGY-2 and PGY-3 levels on the 2022 DXIT, but significantly poorer on image-based questions, and with large variability in answer choices across time points. This study underscores the potential and risks of using minimally-prompted general AI models in interpreting radiologic images as a diagnostic tool. Implementers of general AI radiology systems should exercise caution given the possibility of spurious yet confident responses.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationRadiomics and Machine Learning in Medical ImagingRadiology practices and education
Volltext beim Verlag öffnen