Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Limitations in Chest X‐Ray Interpretation by Vision‐Capable Large Language Models, Gemini 1.0, Gemini 1.5 Pro, GPT‐4 Turbo, and GPT‐4o

2025·0 Zitationen·Preprints.orgOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Background/Objectives: Interpretation of chest X-rays (CXRs) requires accurate identification of lesion presence, diagnosis, location, size, and number to be considered complete. However, the effectiveness of large language models with vision capabilities (vLLMs) in performing these tasks remains uncertain. This study aimed to evaluate the image interpretation performance of vLLMs in the absence of clinical information. Methods: A total of 247 CXRs covering 13 diagnoses, such as pulmonary edema, cardiomegaly, lobar pneumonia, and other medical conditions, were evaluated using Gemini 1.0, Gemini 1.5 Pro, GPT-4 Turbo, and GPT-4o. The text outputs generated by the vLLMs were assessed for diagnostic accuracy and identification of key imaging features. Each interpretation was classified as fully correct, partially correct, or incorrect according to the criteria for complete interpretation. Results: When both fully and partially correct responses were considered as successful detections, vLLMs effectively identified large, bilateral, multiple lesions and big devices, such as acute pulmonary edema (53.8%), lobar pneumonia (55%), multiple malignancies (55%), massive pleural effusions (47.5%) and pacemakers (98.3%), showing significant differences in the chi-square test. Feature descriptions varied among models, especially in posteroanterior and anteroposterior views and side markers, though central lines were partially recognized. Gemini 1.5 Pro (49.0%) performed best, followed by Gemini 1.0 (43.8%), GPT-4o (32.0%), and GPT-4Turbo (20.0%). Conclusions: Although vLLMs were able to identify certain diagnoses and key imaging features, their limitations in detecting small lesions, recognizing laterality, reasoning through differential diagnoses, and using domain-specific expressions indicate that CXR interpretation without textual cues still requires further improvement.

Autoren

Themen

COVID-19 diagnosis using AIArtificial Intelligence in Healthcare and EducationRadiology practices and education

Volltext beim Verlag öffnen

Limitations in Chest X‐Ray Interpretation by Vision‐Capable Large Language Models, Gemini 1.0, Gemini 1.5 Pro, GPT‐4 Turbo, and GPT‐4o

Abstract

Ähnliche Arbeiten

Autoren

Themen