Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparative Analysis of ChatGPT-4o and Gemini Advanced Performance on Diagnostic Radiology In-Training Exams

2025·3 Zitationen·CureusOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Background The increasing integration of artificial intelligence (AI) in medical education and clinical practice has led to a growing interest in large language models (LLMs) for diagnostic reasoning and training. LLMs have demonstrated potential in interpreting medical text, summarizing findings, and answering radiology-related questions. However, their ability to accurately analyze both written and image-based content in radiology remains uncertain with newer models. This study evaluates the performance of OpenAI's Chat Generative Pre-trained Transformer 4o (ChatGPT-4o) and Google DeepMind's Gemini Advanced on the 2022 ACR (American College of Radiology) Diagnostic Radiology In-Training (DXIT) Exam to assess their capabilities in different radiological subfields. Methods ChatGPT-4o and Gemini Advanced were tested on 106 multiple-choice questions from the 2022 DXIT exam, which included both image-based and written-based questions spanning various radiological subspecialties. Their performance was compared using overall accuracy, subfield-specific accuracy, and two-proportion z-tests to determine significant differences. Results ChatGPT-4o achieved an overall accuracy of 69.8% (74/106), outperforming Gemini Advanced, which scored 60.4% (64/106), although the difference was not statistically significant (p = 0.151). In image-based questions (n = 64), ChatGPT-4o performed better (57.8%, 37/64) than Gemini Advanced (43.8%, 28/64). For written-based questions (n = 42), ChatGPT-4o and Gemini Advanced demonstrated similar accuracy (88.1% vs. 85.7%). ChatGPT-4o exhibited stronger performance in specific subfields, such as cardiac and nuclear radiology, but neither model showed consistent superiority across all radiology domains. Conclusion LLMs show promise in radiology education and diagnostic reasoning, particularly for text-based assessments. However, limitations such as inconsistent responses and lower accuracy in image interpretation highlight the need for further refinement. Future research should focus on improving AI models' reliability, multimodal capabilities, and integration into radiology training programs.

Autoren

Institutionen

University of South Florida(US)

Themen

Artificial Intelligence in Healthcare and EducationRadiomics and Machine Learning in Medical ImagingRadiology practices and education

Volltext beim Verlag öffnen

Comparative Analysis of ChatGPT-4o and Gemini Advanced Performance on Diagnostic Radiology In-Training Exams

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen