Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
The Asymmetric Burden of Proof: LLMs Show a Null-Result Asymmetry in a Matched-Vignette Benchmark
0
Zitationen
1
Autoren
2026
Jahr
Abstract
This paper presents empirical evidence of a systematic epistemic failure mode in large language models termed the asymmetric burden of proof. Using a matched-pair benchmark design, three models (GPT-4o, GPT-5.2 Thinking, Claude Haiku 4.5) evaluated fictional scientific vignettes in which evidence quality was held constant while only the conclusion direction was reversed. Across all six model-format conditions, models allocated significantly less probability mass to null claims than to matched positive claims, with gaps of 19.6 to 56.7 percentage points. The asymmetry was directionally consistent in 23 of 24 pair-condition cells and persisted even when discrete classification labels collapsed entirely, surfacing through probability allocation rather than categorical commitment. A secondary finding documents label collapse in newer models, where probability-based asymmetry persists invisibly to label-based monitoring systems. Findings have direct implications for LLM deployment in evidence synthesis, safety assessment, and decision-support pipelines. Dataset and methodology available from the author.
Ähnliche Arbeiten
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
2017 · 20.366 Zit.
Generative Adversarial Nets
2023 · 19.841 Zit.
Visualizing and Understanding Convolutional Networks
2014 · 15.244 Zit.
"Why Should I Trust You?"
2016 · 14.255 Zit.
On a Method to Measure Supervised Multiclass Model’s Interpretability: Application to Degradation Diagnosis (Short Paper)
2024 · 13.122 Zit.