Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Multimodal Large Language Models are Generalist Medical Image Interpreters
15
Zitationen
6
Autoren
2023
Jahr
Abstract
Abstract Medicine is undergoing a transformation with the integration of Artificial Intelligence (AI). Traditional AI models, though clinically useful and often matching or surpassing expert clinicians in specific tasks, face a scalability challenge due to the necessity of developing individual models for each task. Therefore, there is a push towards foundation models that are applicable to a wider set of tasks. Our study showcases how non-domain-specific, publicly available vision-language models can be employed as general foundation models for medical applications. We test our paradigm across four medical disciplines - pathology, dermatology, ophthalmology, and radiology - focusing on two use-cases within each discipline. We find that our approach beats existing pre-training methods and is competitive to domain-specific foundation models that require vast amounts of domain-specific training images. We also find that large vision-language models are data efficient and do not require large annotated datasets to reach competitive performance. This allows for the development of new or improved AI models in areas of medicine where data is scarce and will accelerate medical progress towards true multimodal foundation models.
Ähnliche Arbeiten
A survey on deep learning in medical image analysis
2017 · 13.526 Zit.
Dermatologist-level classification of skin cancer with deep neural networks
2017 · 13.148 Zit.
A survey on Image Data Augmentation for Deep Learning
2019 · 11.758 Zit.
QuPath: Open source software for digital pathology image analysis
2017 · 8.122 Zit.
Radiomics: Images Are More than Pictures, They Are Data
2015 · 7.991 Zit.