Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Utility-Preserving DP-SGD for Large Language Models in Mental-Health EHRs
0
Zitationen
4
Autoren
2025
Jahr
Abstract
Large language models can ease psychiatrists’ heavy documentation load-but only if they learn from patient notes without revealing anything about the people behind them. We introduce a two-step privacy pipeline. First, a mix of pattern-matching rules and a neural tagger scrubs lingering names, dates, and other identifiers, removing 99% of residual protected-health-information (PHI) from notes whose median length is 421 tokens. Second, we fine-tune the model with a differential privacy mechanism that adds just enough random noise to conceal any single patient’s contribution, automatically tuning the protection for both short and long notes. Applied to 147000 psychiatric notes from the public MIMIC-IV database, the resulting model predicts the next word correctly 85% of the time-within three percentage points of a nonprivate baseline-while membership-inference attacks drop from an area under the ROC curve (AUC) of 0.86 to a near-random 0.52. The privacy measures add only 13 hours of computation on a four-GPU server, demonstrating that strong privacy and practical utility can coexist in real-world clinical language models.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.294 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.666 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.189 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.588 Zit.
Artificial intelligence in healthcare: past, present and future
2017 · 4.405 Zit.