Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Forecasting Alzheimer’s Disease Progression with Deep Multimodal Learning: Integration of 3D MRI and Tabular Clinical Records via a Large Vision-Language Model

2026·0 Zitationen·medRxivOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Abstract Background Accurate forecasting of Alzheimer’s Disease (AD) progression is critical for personalized patient management and clinical trial stratification. However, current predictive models often struggle to effectively integrate high-dimensional neuroimaging with longitudinal clinical data. We introduce AD-LLaVA-3D, a novel multimodal framework designed to bridge this gap by adapting large vision-language models for volumetric and temporal forecasting. Methods We leveraged the LLaVA-NeXT-Video architecture to treat 3D MRI volumes as temporal sequences, enabling the model to process volumetric imaging alongside longitudinal Tabular Clinical Records (TCR). The model was trained on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort (n=764) and evaluated using a rigorous patient-level split. We assessed its ability to forecast a suite of future clinical indicators (e.g., CDR-SB, MMSE) against traditional machine learning baselines (Lasso, Random Forest, Gradient Boosting) and specialized deep learning models (ResNet-3D, Med-Flamingo). Results AD-LLaVA-3D demonstrated superior predictive accuracy on the ADNI test set, achieving a Coefficient of Determination ( R 2 ) of 0.68 for the critical CDR-SB score, surpassing the best-performing baseline ( R 2 = 0.66). Crucially, in an independent external validation on the Open Access Series of Imaging Studies (OASIS) cohort (n=76), our model exhibited exceptional generalization ( R 2 = 0.82, MSE = 0.54), whereas comparison models showed significant performance degradation ( R 2 < 0.60). Conclusions This study presents the first application of a video-based multimodal architecture for AD progression forecasting. By effectively integrating 3D MRI with tabular clinical records, AD-LLaVA-3D offers a robust, generalizable tool for monitoring disease trajectories, significantly advancing predictive capabilities beyond current unimodal or static methods. Highlights First-in-Class Architecture: We introduce the first application of video-based Large Vision-Language Models (LVLMs) to interpret 3D volumetric MRI as a temporal sequence, capturing longitudinal neurodegeneration more effectively than static 3D-CNNs. Robust External Validation: The model achieved superior predictive accuracy ( R 2 = 0.82) on an independent external cohort (OASIS), demonstrating exceptional generalization beyond the training population (ADNI). Data-Efficient Multimodal Integration: We developed a novel prompting strategy that integrates sparse Tabular Clinical Records (TCR) without artificial imputation, allowing the model to leverage incomplete real-world medical history. Clinical Trial Enrichment: By accurately forecasting future cognitive scores (CDR-SB, MMSE), AD-LLaVA-3D serves as a precise screening tool to identify ”rapid progressors” for clinical trials, potentially reducing failure rates in drug development. 1

Autoren

Institutionen

Themen

Dementia and Cognitive Impairment ResearchMachine Learning in HealthcareArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Forecasting Alzheimer’s Disease Progression with Deep Multimodal Learning: Integration of 3D MRI and Tabular Clinical Records via a Large Vision-Language Model

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen