OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 14.03.2026, 17:18

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Medical Vision-Language Pre-training with Multimodal Variational Masked Autoencoder for Robust Medical VQA

2025·0 Zitationen
Volltext beim Verlag öffnen

0

Zitationen

8

Autoren

2025

Jahr

Abstract

Medical Visual Question Answering (Medical VQA) plays an important role in medical informatics. However, the robustness of existing medical VQA models is severely challenged by adversarial attacks. Current methods (e.g. adversarial training and noise-based reasoning) heavily rely on additional data or complex procedures and often ignore model-level robustness. To address these issues, we propose Multimodal Variational Masked Autoencoder (MVMAE), a novel pre-training framework designed to enhance the robustness of the medical VQA task. MVMAE leverages masked modeling and variational inference to extract robust multimodal features. The framework introduces a low-cost multimodal bottleneck fusion module and employs reparameterization to sample robust latent representations, ensuring effective feature fusion and reconstruction. Extensive experiments on public medical VQA datasets demonstrate that MVMAE significantly improves resistance to various adversarial attacks and outperforms other medical multimodal pre-training methods.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

COVID-19 diagnosis using AIArtificial Intelligence in Healthcare and EducationAdvanced Neural Network Applications
Volltext beim Verlag öffnen