Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Mitigating Exploding Gradients in Large Language Models with Neural Architecture Search

2025·0 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Large Language Models (LLMs) have significantly advanced natural language processing but frequently encounter instability issues during training, notably exploding gradients. Exploding gradients disrupt convergence, prolong training cycles, and reduce overall model performance. To address this, we propose a targeted Neural Architecture Search (NAS) framework designed to automatically identify architectures optimized for gradient stability. Our methodology strategically adjusts model depth, incorporates adaptive normalization strategies, and optimizes skip connections. We provide a review of recent NAS techniques aimed at stabilizing training, highlighting the impact of dynamic architectural adjustments and adaptive normalization methods. Preliminary results indicate that architectures identified by our NAS framework substantially reduce gradient explosion instances, ensuring smoother convergence and improved performance. This paper provides practical insights and automated strategies for robust architecture designs, essential for advancing stable and efficient LLM training.

Autoren

Themen

Topic ModelingCOVID-19 diagnosis using AIArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Mitigating Exploding Gradients in Large Language Models with Neural Architecture Search

Abstract

Ähnliche Arbeiten

Autoren

Themen