OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 25.04.2026, 20:50

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Mitigating Spurious Correlations in LLMs via Causality-Aware Post-Training

2025·0 Zitationen·ArXiv.orgOpen Access
Volltext beim Verlag öffnen

0

Zitationen

2

Autoren

2025

Jahr

Abstract

While large language models (LLMs) have demonstrated remarkable capabilities in language modeling, recent studies reveal that they often fail on out-of-distribution (OOD) samples due to spurious correlations acquired during pre-training. Here, we aim to mitigate such spurious correlations through causality-aware post-training (CAPT). By decomposing a biased prediction into two unbiased steps, known as \textit{event estimation} and \textit{event intervention}, we reduce LLMs' pre-training biases without incurring additional fine-tuning biases, thus enhancing the model's generalization ability. Experiments on the formal causal inference benchmark CLadder and the logical reasoning dataset PrOntoQA show that 3B-scale language models fine-tuned with CAPT can outperform both traditional SFT and larger LLMs on in-distribution (ID) and OOD tasks using only 100 ID fine-tuning samples, demonstrating the effectiveness and sample efficiency of CAPT.

Ähnliche Arbeiten

Autoren

Themen

Topic ModelingExplainable Artificial Intelligence (XAI)Artificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen