OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 19.03.2026, 21:06

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A High-Performance Knowledge Distillation Framework Based on Temperature Decoupling

2025·0 Zitationen
Volltext beim Verlag öffnen

0

Zitationen

3

Autoren

2025

Jahr

Abstract

Knowledge Distillation (KD) has become a widely used model compression technique in large language models (LLMs). Most mainstream KD methods adopt a temperature-sharing mechanism, where both teacher and student models use a common softmax temperature to smooth predictions. However, this shared-temperature setting often results in distribution misalignment between the two models, limiting distillation effectiveness. To address this issue, we propose temperature-decoupled knowledge distillation(TDKD), a distillation framework that allows independent temperature control for teacher and student models. We theoretically justify this decoupling using the Lagrange principle, and introduce ExpStep-TS, a generalized exponential-stepwise temperature schedule that enhances flexibility in temperature tuning. Furthermore, we propose an Asymmetric Temperature Correction (ATC) mechanism to analyze the impact of temperature Scaling factor on forward and reverse KL divergences, and develop the TDKL loss accordingly. Experiments conducted on the GPT-2 model family across four datasets—Dolly, Self-Instruct, Sinst, and Vicuna—demonstrate that our framework is compatible with multiple loss functions and achieves consistent ROUGE-L improvements of $\mathbf{6 \%} \boldsymbol{-} \mathbf{1 5 \%}$, validating its effectiveness in natural language generation and summarization.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Topic ModelingArtificial Intelligence in Healthcare and EducationMachine Learning in Healthcare
Volltext beim Verlag öffnen