Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Differential Privacy on Large Language Models for Privacy Preserving Clinical Coding
0
Zitationen
4
Autoren
2025
Jahr
Abstract
Recent advancements in Large Language Models (LLMs) have significantly enhanced performance across various Natural Language Processing (NLP) tasks. In certain fields, particularly healthcare, the risk of data leakage in research data management is a critical concern when employing LLMs. To ensure data privacy, recent studies have adopted approaches, such as de-identification by masking out personal identifiable information. However, these anonymisation techniques remain vulnerable to various attacks, including linkage attacks, attribute inference attacks, and membership inference attacks. Differential privacy is a robust anonymisation technique that constrains the influence of individual data samples during model training to address data leakage. Nonetheless, the trade-off between utility and privacy protection remains challenging. Moreover, while differential privacy has been extensively studied in the context of tabular and image data, its application in NLP, especially with clinical data, is limited. In this paper, we explore the integration of differential privacy into the fine-tuning process of LLMs for clinical data, covering a range of model sizes and privacy standards within a healthcare context. We utilise these LLMs to generate synthetic medical notes and assess the privacy and utility of our differential privacy training approach by deploying these synthetic notes in a downstream clinical coding task. Our findings demonstrate that synthetic data from differential privacy-based LLMs achieve comparable or superior classification accuracy to non-differential privacy-based LLMs.
Ähnliche Arbeiten
k-ANONYMITY: A MODEL FOR PROTECTING PRIVACY
2002 · 8.395 Zit.
Calibrating Noise to Sensitivity in Private Data Analysis
2006 · 6.871 Zit.
Deep Learning with Differential Privacy
2016 · 5.592 Zit.
Communication-Efficient Learning of Deep Networks from Decentralized\n Data
2016 · 5.591 Zit.
Large-Scale Machine Learning with Stochastic Gradient Descent
2010 · 5.561 Zit.