Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Performance Evaluation of Multimodal Large Language Models (LLaVA and GPT-4-based ChatGPT) in Medical Image Classification Tasks

2024·14 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2024

Jahr

Abstract

Large language models (LLMs) have gained significant attention due to their prospective applications in medicine. Utilizing multimodal LLMs can potentially assist clinicians in medical image classification tasks. It is important to evaluate the performance of LLMs in medical image processing to potentially improve the medical system. We evaluated two multimodal LLMs (LLaVA and GPT-4-based ChatGPT) against the classic VGG in tumor classification across brain MRI, breast ultrasound, and kidney CT datasets. Despite LLMs facing significant hallucination issue in medical imaging, prompt engineering markedly enhanced their performance. In comparison to the baseline method, GPT-4-based ChatGPT with prompt engineering achieves 98%, 112%, and 69% of the baseline's performance in terms of accuracy (or 99%, 107%, and 62 % in terms of F1-score) in those three datasets, respectively. However, privacy, bias, accountability, and transparency concerns necessitate caution. Our study underscore LLMs' potential in medical imaging but emphasize the need for thorough performance and safety evaluations for their practical application.

Autoren

Institutionen

Themen

Topic ModelingArtificial Intelligence in Healthcare and EducationCOVID-19 diagnosis using AI

Volltext beim Verlag öffnen

Performance Evaluation of Multimodal Large Language Models (LLaVA and GPT-4-based ChatGPT) in Medical Image Classification Tasks

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen