Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance of Large Language Models in Recognizing Brain MRI Sequences: A Comparative Analysis of ChatGPT-4o, Claude 4 Opus, and Gemini 2.5 Pro
4
Zitationen
2
Autoren
2025
Jahr
Abstract
<b>Background/Objectives</b>: Multimodal large language models (LLMs) are increasingly used in radiology. However, their ability to recognize fundamental imaging features, including modality, anatomical region, imaging plane, contrast-enhancement status, and particularly specific magnetic resonance imaging (MRI) sequences, remains underexplored. This study aims to evaluate and compare the performance of three advanced multimodal LLMs (ChatGPT-4o, Claude 4 Opus, and Gemini 2.5 Pro) in classifying brain MRI sequences. <b>Methods</b>: A total of 130 brain MRI images from adult patients without pathological findings were used, representing 13 standard MRI series. Models were tested using zero-shot prompts for identifying modality, anatomical region, imaging plane, contrast-enhancement status, and MRI sequence. Accuracy was calculated, and differences among models were analyzed using Cochran's Q test and McNemar test with Bonferroni correction. <b>Results</b>: ChatGPT-4o and Gemini 2.5 Pro achieved 100% accuracy in identifying the imaging plane and 98.46% in identifying contrast-enhancement status. MRI sequence classification accuracy was 97.7% for ChatGPT-4o, 93.1% for Gemini 2.5 Pro, and 73.1% for Claude 4 Opus (<i>p</i> < 0.001). The most frequent misclassifications involved fluid-attenuated inversion recovery (FLAIR) sequences, often misclassified as T1-weighted or diffusion-weighted sequences. Claude 4 Opus showed lower accuracy in susceptibility-weighted imaging (SWI) and apparent diffusion coefficient (ADC) sequences. Gemini 2.5 Pro exhibited occasional hallucinations, including irrelevant clinical details such as "hypoglycemia" and "Susac syndrome." <b>Conclusions</b>: Multimodal LLMs demonstrate high accuracy in basic MRI recognition tasks but vary significantly in specific sequence classification tasks. Hallucinations emphasize caution in clinical use, underlining the need for validation, transparency, and expert oversight.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.250 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.109 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.482 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.434 Zit.