Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Explaining the 'Unexplainable' Large Language Models
0
Zitationen
6
Autoren
2026
Jahr
Abstract
The integration of Large Language Models (LLMs) into critical societal and scientific functions has intensified the urgent demand for transparency, reliability, and trust. While post-hoc attribution methods and Chain-of-Thought reasoning currently serve as the dominant approaches to explainability, growing evidence shows that they are often unreliable, producing brittle, misleading, or illusory explanations that fail to reflect true model behavior. This tutorial aims to unpack why these limitations arise. We first establish the theoretical intractability of complete mechanistic explanations for modern LLMs and clarify the intrinsic barriers to achieving full transparency in overparameterized models. We then pivot to a principled alternative: user-centric explainability, with a focus on concept-based interpretability and controlled data attribution. We review the theoretical foundations of these methods and survey their modern extensions that enable comprehensive explanation, inference-time intervention, and model editability. Finally, we demonstrate how such approaches support effective human--AI collaboration in high-stakes scientific and decision-critical applications. By synthesizing foundational theory, critical analysis of existing methods, and emerging techniques, this tutorial offers a coherent framework for developing the next generation of explainable and trustworthy AI systems.
Ähnliche Arbeiten
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
2017 · 20.253 Zit.
Generative Adversarial Nets
2023 · 19.841 Zit.
Visualizing and Understanding Convolutional Networks
2014 · 15.230 Zit.
"Why Should I Trust You?"
2016 · 14.156 Zit.
On a Method to Measure Supervised Multiclass Model’s Interpretability: Application to Degradation Diagnosis (Short Paper)
2024 · 13.093 Zit.