Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Vision-language zero-shot models for radiographic image classification: A systematic review
0
Zitationen
3
Autoren
2025
Jahr
Abstract
Zero-shot Vision-Language Models (VLMs) link visual and textual features, enabling generalization to unseen domains, making them promising for radiographic diagnosis, though clinical adoption is limited. This systematic review examines zero-shot VLMs applied to radiographic image classification, following the PRISMA methodology. Articles were identified from IEEE, PubMed, Scopus, and Web of Science, with 16 selected after exhaustive screening. The analysis addressed five research questions (RQ1–RQ5) covering dataset characteristics, model attributes, natural language integration, reported limitations, and hyperparameter tuning. Geographically, China (37%) and the United States (38%) contributed 75% of the reviewed studies, with no EU-led research identified, highlighting the need for increased European engagement in this field. Architecturally (RQ2), high heterogeneity exists, with dual-encoder (43.75%) and attention-based fusion models most common. Most models (81.25%) employ a Joint Embedding Space for multimodal alignment. Regarding datasets and natural language use (RQ1, RQ3), VLMs rely on few large but semantically narrow datasets, limiting generalizability and amplifying bias. Real clinical reports (direct supervision) and implicit pretrained textual embeddings each represent 37.5% of strategies, yet unstructured clinical text is underutilized. Limited vision-language integration negatively affects performance and explainability (RQ4). Hyperparameter tuning (RQ5) is rarely reported, with 9 of 16 studies not specifying methods, compromising reproducibility. There is an urgent need for open, multilingual, multimodal datasets reflecting clinical and geographic diversity. Clinically useful zero-shot VLMs require transparent evaluation, including explainability metrics. Future models should adopt a multidisciplinary approach, combining technical innovation with usability, data representativeness, and methodological transparency to ensure diagnostic robustness. • Systematic review of zero-shot VLMs for radiographic image classification. • No studies led by EU institutions despite strong US and China output. • Current VLMs rely on narrow datasets, limiting generalizability and fairness. • Limited text–image integration hinders performance and explainability. • Call for open, multilingual, multimodal datasets and transparent evaluation.
Ähnliche Arbeiten
Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study
2020 · 22.609 Zit.
La certeza de lo impredecible: Cultura Educación y Sociedad en tiempos de COVID19
2020 · 19.271 Zit.
A Multi-Modal Distributed Real-Time IoT System for Urban Traffic Control (Invited Paper)
2024 · 14.255 Zit.
UNet++: A Nested U-Net Architecture for Medical Image Segmentation
2018 · 8.516 Zit.
Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
2021 · 7.124 Zit.