OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 16.03.2026, 04:09

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Privacy-Preserving Machine Learning Models to Derive Risk Factors of Kidney Disease in EHRs

2025·0 Zitationen
Volltext beim Verlag öffnen

0

Zitationen

5

Autoren

2025

Jahr

Abstract

Chronic Kidney Disease (CKD) is asymptomatic at an early stage and developmental and in order to provide treatment, early detection of risk factors is vital. The potential of predictive modeling with the structured, rich data of Electronic Health Records (EHRs) has been viewed as great, but these records contain sensitive data which is a concern of privacy and can be hamstrung by regulatory issues. To solve this, we present a complete Privacy-Preserving Machine Learning (PPML) system integrating Federated Learning (FL), Differential Privacy (DP), and Synthetic Data Augmentation to determine CKD risk factors without violating patient confidentiality. Such supervised learning algorithms as Random Forests (RF), Support Vector Machines (SVMs), and Gradient Boosted Decision Trees (GBDT) were deployed and trained at different decentralized institutions, which collaborated to train their models on decentralized data but not on the raw data. The sensitivity of datasets required pure Differential Privacy (ε=1, δ=1e-5) to guarantee formal privacy latencies, whereas synthetic data was used to alleviate class imbalance and sparse data. An experimental analysis on de-identified real-world EHR datasets revealed that our framework performs comparably well (with full privacy-preserving setting using XGBoost delivering AUC-ROC of 0.87) when compared to centralized baselines (performance difference was (on average) only ~2 percent). The importance analysis of parameters inferred on variable importance in machine learning disclosed important predictors, like the serum creatinine, age, blood pressure, and proteinuria. These findings support the feasibility and success of using interpretable and regulation-friendly machine learning models in heterogeneous healthcare systems and lay the groundwork of future applications into time-ready modeling and use of unstructured clinical texts to perform multimodal risk factor extraction based on NLP.

Ähnliche Arbeiten