Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

The Asymmetric Burden of Proof: LLMs Show a Null-Result Asymmetry in a Matched-Vignette Benchmark

2026·0 Zitationen·Zenodo (CERN European Organization for Nuclear Research)Open Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

This paper presents empirical evidence of a systematic epistemic failure mode in large language models termed the asymmetric burden of proof. Using a matched-pair benchmark design, three models (GPT-4o, GPT-5.2 Thinking, Claude Haiku 4.5) evaluated fictional scientific vignettes in which evidence quality was held constant while only the conclusion direction was reversed. Across all six model-format conditions, models allocated significantly less probability mass to null claims than to matched positive claims, with gaps of 19.6 to 56.7 percentage points. The asymmetry was directionally consistent in 23 of 24 pair-condition cells and persisted even when discrete classification labels collapsed entirely, surfacing through probability allocation rather than categorical commitment. A secondary finding documents label collapse in newer models, where probability-based asymmetry persists invisibly to label-based monitoring systems. Findings have direct implications for LLM deployment in evidence synthesis, safety assessment, and decision-support pipelines. Dataset and methodology available from the author.

Autoren

Rolando Bosch Rodriguez

Institutionen

Themen

Explainable Artificial Intelligence (XAI)Reliability and Agreement in MeasurementArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

The Asymmetric Burden of Proof: LLMs Show a Null-Result Asymmetry in a Matched-Vignette Benchmark

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen