OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 15.03.2026, 12:34

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Task-Specific Knowledge Distillation for Accurate and Efficient Text Summarization

2025·0 Zitationen
Volltext beim Verlag öffnen

0

Zitationen

4

Autoren

2025

Jahr

Abstract

Generative AI has emerged as a transformative technology in natural language processing (NLP). It is enabling advanced capabilities such as text summarization, question answering, and content generation. Large Language Models (LLMs) have demonstrated exceptional performance in NLP tasks, with text summarization being a prominent example. Proprietary LLMs, such as GPT-4 and LLaMA 70B, achieve high accuracy but often incur substantial computational costs, usage fees, limited accessibility, and potential data privacy risks. In contrast, compact LLMs, including LLaMA 3.1 8B and Falcon 7B, offer greater flexibility, transparency, and control but frequently suffer from factual inconsistencies and semantic errors. In this study, we propose a task-specific knowledge distillation (KD) technique to transfer summarization capabilities from large teacher models LLaMA 3.1(70B), Falcon (40B), Gemma2 (27B), and Qwen $2.5(72 \mathrm{~B})$ to smaller student models $(8 \mathrm{~B}, 7 \mathrm{~B}, 2 \mathrm{~B}$, and 7 B, respectively). The distillation process leverages both cross-entropy loss and Kullback-Leibler divergence to align student predictions with teacher outputs. Distilled models are evaluated on Semantic Textual Similarity (STS-B) and Multi-Genre Natural Language Inference (MNLI) tasks, along with response time metrics. In experiments, the LLaMA 3.1 8B distilled student model achieves 0.85 STS-B Pearson correlation and 0.81 MNLI accuracy, retaining over $90 \%$ of the teacher model’s performance while reducing response time from 12s to 3s. Overall, distilled student models retain $\mathbf{8 5 - 9 0 \%}$ of the teachers’ semantic and factual performance while reducing inference latency by $\mathbf{3}-\mathbf{6} \boldsymbol{\times}$. These findings demonstrate that logit-based knowledge distillation enables the development of accurate and efficient summarization models suitable for resource-constrained environments. Entire code implementation can be found at:https://github.com/Abishethvarman/KD-Text-Summarization

Ähnliche Arbeiten