Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Towards Uncovering How Large Language ModelsWork: An Interpretability Perspective

2025·0 Zitationen·ACM SIGKDD Explorations Newsletter

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Large language models (LLMs) have shown re2 markable performance in tackling natural language 3 tasks, yet the internal mechanisms that enable their 4 impressive generalization and reasoning abilities 5 remain opaque. This lack of transparency presents 6 significant challenges in fundamentally eliminating 7 undesirable behaviors such as hallucinations and 8 toxicity, hindering the safe and beneficial deploy9 ment of LLMs. This survey paper aims to un10 cover the internal working mechanisms underly11 ing LLM functionality through the lens of explain12 ability. First, we review how knowledge is en13 coded within LLMs via mechanistic interpretabil14 ity techniques. Then, we summarize what knowl15 edge is embedded in LLMrepresentations by lever16 aging probing techniques and representation engi17 neering. Additionally, we investigate the training 18 dynamics to explore models' generalization abili19 ties through grokking and memorization. Finally, 20 we explore how the insights gained from these ex21 planations can further enhance LLM performance 22 through model editing, improve efficiency through 23 pruning, and better align with human values.

Autoren

Institutionen

Themen

Explainable Artificial Intelligence (XAI)Artificial Intelligence in Healthcare and EducationTopic Modeling

Volltext beim Verlag öffnen

Towards Uncovering How Large Language ModelsWork: An Interpretability Perspective

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen