Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparative Analysis of ChatGPT-4o and Gemini Advanced Performance on Diagnostic Radiology In-Training Exams
3
Zitationen
4
Autoren
2025
Jahr
Abstract
Background The increasing integration of artificial intelligence (AI) in medical education and clinical practice has led to a growing interest in large language models (LLMs) for diagnostic reasoning and training. LLMs have demonstrated potential in interpreting medical text, summarizing findings, and answering radiology-related questions. However, their ability to accurately analyze both written and image-based content in radiology remains uncertain with newer models. This study evaluates the performance of OpenAI's Chat Generative Pre-trained Transformer 4o (ChatGPT-4o) and Google DeepMind's Gemini Advanced on the 2022 ACR (American College of Radiology) Diagnostic Radiology In-Training (DXIT) Exam to assess their capabilities in different radiological subfields. Methods ChatGPT-4o and Gemini Advanced were tested on 106 multiple-choice questions from the 2022 DXIT exam, which included both image-based and written-based questions spanning various radiological subspecialties. Their performance was compared using overall accuracy, subfield-specific accuracy, and two-proportion z-tests to determine significant differences. Results ChatGPT-4o achieved an overall accuracy of 69.8% (74/106), outperforming Gemini Advanced, which scored 60.4% (64/106), although the difference was not statistically significant (p = 0.151). In image-based questions (n = 64), ChatGPT-4o performed better (57.8%, 37/64) than Gemini Advanced (43.8%, 28/64). For written-based questions (n = 42), ChatGPT-4o and Gemini Advanced demonstrated similar accuracy (88.1% vs. 85.7%). ChatGPT-4o exhibited stronger performance in specific subfields, such as cardiac and nuclear radiology, but neither model showed consistent superiority across all radiology domains. Conclusion LLMs show promise in radiology education and diagnostic reasoning, particularly for text-based assessments. However, limitations such as inconsistent responses and lower accuracy in image interpretation highlight the need for further refinement. Future research should focus on improving AI models' reliability, multimodal capabilities, and integration into radiology training programs.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.260 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.438 Zit.