OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 12.03.2026, 12:07

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Explaining the 'Unexplainable' Large Language Models

2026·0 ZitationenOpen Access
Volltext beim Verlag öffnen

0

Zitationen

6

Autoren

2026

Jahr

Abstract

The integration of Large Language Models (LLMs) into critical societal and scientific functions has intensified the urgent demand for transparency, reliability, and trust. While post-hoc attribution methods and Chain-of-Thought reasoning currently serve as the dominant approaches to explainability, growing evidence shows that they are often unreliable, producing brittle, misleading, or illusory explanations that fail to reflect true model behavior. This tutorial aims to unpack why these limitations arise. We first establish the theoretical intractability of complete mechanistic explanations for modern LLMs and clarify the intrinsic barriers to achieving full transparency in overparameterized models. We then pivot to a principled alternative: user-centric explainability, with a focus on concept-based interpretability and controlled data attribution. We review the theoretical foundations of these methods and survey their modern extensions that enable comprehensive explanation, inference-time intervention, and model editability. Finally, we demonstrate how such approaches support effective human--AI collaboration in high-stakes scientific and decision-critical applications. By synthesizing foundational theory, critical analysis of existing methods, and emerging techniques, this tutorial offers a coherent framework for developing the next generation of explainable and trustworthy AI systems.

Ähnliche Arbeiten