Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Cross Domain Transfer of Natural Language Explanation Models: Pretraining on e-SNLI and Adapting to a New Target Task
0
Zitationen
4
Autoren
2026
Jahr
Abstract
The extensive use of AI in the critical ICT systems requires not only accurate, but transparent and credible models. Nevertheless, state of the art models are usually black boxes and their answers to decisions can be fragile, and do not make generalizations in different areas of operation. The problem of designing effective, transferable natural language explanations (NLEs) is discussed by building a multi task T5 based model that takes the label-prefixed format of decoders to jointly assign NLI labels and produce explanations. Pretraining of the model occurs on e-SNLI then fine tuning is done under different cross domain conditions, such as label only supervision, frozen encoders, and loss weight variations. Although there are no explanations to be found in the fine-tuning process, the experimental results show that explanation pretraining can greatly improve the linguistic fluency, structure, and relevance of explanations. The partial faithfulness is also provided by token deletion tests which reveal that the explanations are based on the same evidence as the classifier does. Abalation studies demand stable and transferable explanations to be characterized by balanced loss weighting, encoder adaptation, and explanation oversight. These results point to the necessity of standardized assessment tools of NLE and indicate directions on how the explanation-capable models can be incorporated into ICT systems that need transparency and accountability.
Ähnliche Arbeiten
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
2017 · 20.634 Zit.
Generative Adversarial Nets
2023 · 19.894 Zit.
Visualizing and Understanding Convolutional Networks
2014 · 15.311 Zit.
"Why Should I Trust You?"
2016 · 14.478 Zit.
On a Method to Measure Supervised Multiclass Model’s Interpretability: Application to Degradation Diagnosis (Short Paper)
2024 · 13.178 Zit.