OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 17.03.2026, 10:24

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Federated Deep NLP: Mitigating Layer-Wise Gradient Divergence via Architectural and Data-Centric Design

2025·0 Zitationen
Volltext beim Verlag öffnen

0

Zitationen

1

Autoren

2025

Jahr

Abstract

Federated learning enables privacy-preserving training of large language models across distributed clients, yet deeper neural architectures often underperform compared to centralized training due to layerwise drift in client update directions. We introduce a formal, layer resolved divergence measure for federated optimization and analyze how backpropagation accumulates client-specific disparities, amplifying instability as depth increases. Guided by this analysis, we propose practical, model- and data-centric principles for federated NLP: widen layers to promote “lazy” updates that stabilize gradients, constrain effective receptive fields in early representations (e.g., smaller context windows or localized attention) to align low-level features across clients, and harmonize input preprocessing (e.g., resolution/tokenization granularity and augmentation) to reduce inter-client variability at the source. Extensive experiments across federated transformers of varying depth demonstrate improved convergence, reduced divergence, and smaller gaps to centralized training. The results highlight divergence-aware design as a simple, effective recipe for reliably scaling deep NLP models in federated machine learning. Extensive experiments across federated transformers of varying depth demonstrate that our approach reduces layer-wise gradient divergence by up to 37% compared to baseline FL, improves convergence speed by 1.8×, and narrows the accuracy gap with centralized training from 12.3% to 3.1% on text classification tasks.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Privacy-Preserving Technologies in DataMobile Crowdsensing and CrowdsourcingArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen