Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Mitigating Exploding Gradients in Large Language Models with Neural Architecture Search
0
Zitationen
4
Autoren
2025
Jahr
Abstract
Large Language Models (LLMs) have significantly advanced natural language processing but frequently encounter instability issues during training, notably exploding gradients. Exploding gradients disrupt convergence, prolong training cycles, and reduce overall model performance. To address this, we propose a targeted Neural Architecture Search (NAS) framework designed to automatically identify architectures optimized for gradient stability. Our methodology strategically adjusts model depth, incorporates adaptive normalization strategies, and optimizes skip connections. We provide a review of recent NAS techniques aimed at stabilizing training, highlighting the impact of dynamic architectural adjustments and adaptive normalization methods. Preliminary results indicate that architectures identified by our NAS framework substantially reduce gradient explosion instances, ensuring smoother convergence and improved performance. This paper provides practical insights and automated strategies for robust architecture designs, essential for advancing stable and efficient LLM training.