OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 15.03.2026, 03:23

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Performance and Adversarial Vulnerability of Vision Language Models in Computer Tomography

2025·0 ZitationenOpen Access
Volltext beim Verlag öffnen

0

Zitationen

5

Autoren

2025

Jahr

Abstract

Abstract This study investigates the performance and vulnerability of Vision Language Models (VLMs) in interpreting computed tomography (CT). In a factorial experiment, four leading VLMs (Google, OpenAI, Anthropic, Alibaba) was used to classified 240 kidney CT scans into tumor, cyst, or normal categories under three prompt conditions. A ‘Neutral’ prompt requested simple interpretation, while adversarial ‘Benign’ prompts aimed to mislead, and ‘Pressure’ prompts simulated clinical overload. Key performance metrics, including accuracy, precision, and recall, were evaluated. The overall 3-category classification accuracy under neutral conditions was 48.4%, with Google’s VLM achieving the highest individual accuracy (60.0%), followed by OpenAI (49.2%), Alibaba (42.5%), and Anthropic (42.1%). The introduction of adversarial prompts significantly degraded performance, with overall accuracy decreasing to 39.2% (p<0.001) under benign prompts and 43.3% (p=0.024) under pressure prompts. These prompts also induced significant prediction skews; for instance, pressure prompts systematically biased all models toward ‘normal’ classifications. In conclusion, current VLMs demonstrated modest accuracy for kidney CT classification and were highly vulnerable to adversarial manipulation. These findings raise critical concerns about their reliability and highlight the urgent need for extensive validation before any clinical implementation.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Adversarial Robustness in Machine LearningArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen