Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Federated Deep NLP: Mitigating Layer-Wise Gradient Divergence via Architectural and Data-Centric Design
0
Zitationen
1
Autoren
2025
Jahr
Abstract
Federated learning enables privacy-preserving training of large language models across distributed clients, yet deeper neural architectures often underperform compared to centralized training due to layerwise drift in client update directions. We introduce a formal, layer resolved divergence measure for federated optimization and analyze how backpropagation accumulates client-specific disparities, amplifying instability as depth increases. Guided by this analysis, we propose practical, model- and data-centric principles for federated NLP: widen layers to promote “lazy” updates that stabilize gradients, constrain effective receptive fields in early representations (e.g., smaller context windows or localized attention) to align low-level features across clients, and harmonize input preprocessing (e.g., resolution/tokenization granularity and augmentation) to reduce inter-client variability at the source. Extensive experiments across federated transformers of varying depth demonstrate improved convergence, reduced divergence, and smaller gaps to centralized training. The results highlight divergence-aware design as a simple, effective recipe for reliably scaling deep NLP models in federated machine learning. Extensive experiments across federated transformers of varying depth demonstrate that our approach reduces layer-wise gradient divergence by up to 37% compared to baseline FL, improves convergence speed by 1.8×, and narrows the accuracy gap with centralized training from 12.3% to 3.1% on text classification tasks.
Ähnliche Arbeiten
k-ANONYMITY: A MODEL FOR PROTECTING PRIVACY
2002 · 8.396 Zit.
Calibrating Noise to Sensitivity in Private Data Analysis
2006 · 6.872 Zit.
Deep Learning with Differential Privacy
2016 · 5.595 Zit.
Communication-Efficient Learning of Deep Networks from Decentralized\n Data
2016 · 5.591 Zit.
Large-Scale Machine Learning with Stochastic Gradient Descent
2010 · 5.564 Zit.