Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Validation of Generative AI Techniques for Synthetic Data Generation in Multiple Sclerosis Research: A Comparison with Real-World Evidence from the Italian MS Registry

2025·0 Zitationen·medRxivOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Abstract Importance Large multiple sclerosis (MS) registries provide crucial real-world evidence but often suffer from missing data, inconsistencies, and privacy limitations that restrict data sharing. The use of generative AI to create synthetic data (SD) is an emerging strategy to enhance real-world evidence research potentially overcoming these challenges. Objective To evaluate the validity of AI-generated synthetic data (SD) in replicating real data collected in the Italian MS and Related Disorders Register (RISM), and to compare the risk of progression independent of relapse activity (PIRA) between early intensive treatment (EIT) versus escalation treatment strategy (ESC) in both real and synthetic MS cohorts. Design, Setting, and Participants This validation study analyzed data from RISM. AI-based generative models were trained on a sub-cohort of 1,666 patients with tabularized MRI data to generate a synthetic dataset of 4,878 patients. SD was evaluated using the Synthetic vAlidation FramEwork powered by Train (SAFE), assessing fidelity, utility, and privacy. Clinical Synthetic Fidelity (CSF) and Nearest Neighbor Distance Ratio (NNDR) were used for statistical and privacy validation. Treatment outcome comparisons between EIT and ESC strategies were conducted for clinical validation using both real and synthetic datasets, focusing on the risk of PIRA. Exposures Initial disease-modifying therapy strategy, categorized as EIT versus ESC. Main Outcomes and Measures Primary outcome was the occurrence of PIRA, defined as confirmed disability accrual independent of relapses. Validation metrics included Clinical Synthetic Fidelity (CSF ≥90 optimal) and Nearest Neighbor Distance Ratio (NNDR, range 0.60–0.85 for privacy). Results The synthetic dataset demonstrated high fidelity (CSF=97%) and privacy preservation (NNDR=0.61). Treatment effect estimates for ESCs vs EIT were consistent across real and synthetic datasets, with largely comparable trends, with increased statistical significance in SD. Cox proportional hazards models confirmed the robustness of synthetic data in estimating the risk of the first PIRA event. Conclusions and Relevance AI-generated synthetic data reliably replicated treatment effect outcomes from real-world RISM data, overcoming missing data and providing a privacy-preserving alternative for data sharing and clinical research. Key points Question Can Artificial Intelligence (AI)-generated synthetic data (SD) reliably replicate multiple sclerosis (MS) registry data and provide robust insights into progression independent of relapse activity (PIRA) phenomena under different treatment strategies? Findings In a cohort of 4,878 relapsing-onset MS patients from the Italian MS Register, AI-generated SD achieved high fidelity (CSF = 97%), and reproduced treatment effect outcomes. Both real and synthetic cohorts consistently showed that early intensive therapy reduced the risk of PIRA compared with an escalation strategy. Meaning SD can complement and enhance registry-based research by addressing missing data and supporting reproducible analyses in MS.

Autoren

Institutionen

Themen

Multiple Sclerosis Research StudiesPrivacy-Preserving Technologies in DataArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Validation of Generative AI Techniques for Synthetic Data Generation in Multiple Sclerosis Research: A Comparison with Real-World Evidence from the Italian MS Registry

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen