OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 16.03.2026, 14:14

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Understanding Clinical Reasoning Variability in Medical Large Language Models: A Mechanistic Interpretability Study

2026·0 ZitationenOpen Access
Volltext beim Verlag öffnen

0

Zitationen

12

Autoren

2026

Jahr

Abstract

Medical large language models (LLMs) achieving high benchmark accuracy exhibit unexplained variability in clinical tasks, producing errors that clinicians cannot safeguard against. We evaluated clinical reasoning stability in GPT-5, MedGemma-27B-Text-IT, and OpenBioLLM-Llama3-70B using 355 systematic perturbations of physician-validated oncology cases and trained sparse autoencoders on 1 billion tokens from 50,000 MIMIC-IV clinical notes to decompose their internal representation. We find models exhibit dramatic reasoning instability, shifting staging accuracy by over 50% based solely on prompt format, or generating definitive staging in clinically insufficient scenarios. Sparse autoencoder analysis revealed hierarchical encoding in MedGemma, where high-magnitude features encode lexical identity and low-magnitude features encode contextual meaning. OpenBioLLM distributes information uniformly. We demonstrate these internal encoding structures differentially affect retrieval interventions, suggesting interventions effective for one architecture may harm another. We recommend healthcare institutions implement architecture-specific safety validation, as benchmark equivalence does not imply functional equivalence, with implications for AI safety beyond healthcare.

Ähnliche Arbeiten