Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance and Adversarial Vulnerability of Vision Language Models in Computer Tomography
0
Zitationen
5
Autoren
2025
Jahr
Abstract
Abstract This study investigates the performance and vulnerability of Vision Language Models (VLMs) in interpreting computed tomography (CT). In a factorial experiment, four leading VLMs (Google, OpenAI, Anthropic, Alibaba) was used to classified 240 kidney CT scans into tumor, cyst, or normal categories under three prompt conditions. A ‘Neutral’ prompt requested simple interpretation, while adversarial ‘Benign’ prompts aimed to mislead, and ‘Pressure’ prompts simulated clinical overload. Key performance metrics, including accuracy, precision, and recall, were evaluated. The overall 3-category classification accuracy under neutral conditions was 48.4%, with Google’s VLM achieving the highest individual accuracy (60.0%), followed by OpenAI (49.2%), Alibaba (42.5%), and Anthropic (42.1%). The introduction of adversarial prompts significantly degraded performance, with overall accuracy decreasing to 39.2% (p<0.001) under benign prompts and 43.3% (p=0.024) under pressure prompts. These prompts also induced significant prediction skews; for instance, pressure prompts systematically biased all models toward ‘normal’ classifications. In conclusion, current VLMs demonstrated modest accuracy for kidney CT classification and were highly vulnerable to adversarial manipulation. These findings raise critical concerns about their reliability and highlight the urgent need for extensive validation before any clinical implementation.
Ähnliche Arbeiten
Rethinking the Inception Architecture for Computer Vision
2016 · 30.321 Zit.
MobileNetV2: Inverted Residuals and Linear Bottlenecks
2018 · 24.392 Zit.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2020 · 21.296 Zit.
CBAM: Convolutional Block Attention Module
2018 · 21.270 Zit.
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
2015 · 18.489 Zit.