Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Utility-Preserving DP-SGD for Large Language Models in Mental-Health EHRs

2025·0 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Large language models can ease psychiatrists’ heavy documentation load-but only if they learn from patient notes without revealing anything about the people behind them. We introduce a two-step privacy pipeline. First, a mix of pattern-matching rules and a neural tagger scrubs lingering names, dates, and other identifiers, removing 99% of residual protected-health-information (PHI) from notes whose median length is 421 tokens. Second, we fine-tune the model with a differential privacy mechanism that adds just enough random noise to conceal any single patient’s contribution, automatically tuning the protection for both short and long notes. Applied to 147000 psychiatric notes from the public MIMIC-IV database, the resulting model predicts the next word correctly 85% of the time-within three percentage points of a nonprivate baseline-while membership-inference attacks drop from an area under the ROC curve (AUC) of 0.86 to a near-random 0.52. The privacy measures add only 13 hours of computation on a four-GPU server, demonstrating that strong privacy and practical utility can coexist in real-world clinical language models.

Autoren

Institutionen

Themen

Machine Learning in HealthcarePrivacy-Preserving Technologies in DataArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Utility-Preserving DP-SGD for Large Language Models in Mental-Health EHRs

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen