OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 02.04.2026, 23:34

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Scalable Stewardship of an LLM-Assisted Clinical Benchmark with Physician Oversight

2025·0 Zitationen·arXiv (Cornell University)Open Access
Volltext beim Verlag öffnen

0

Zitationen

6

Autoren

2025

Jahr

Abstract

We examine the reliability of a widely used clinical AI benchmark whose reference labels were partially generated by LLMs, and find that a substantial fraction are clinically misaligned. We introduce a phased stewardship procedure to amplify the positive impact of physician experts' feedback and then demonstrate, via a controlled RL experiment, how uncaught label bias can materially affect downstream LLM evaluation and alignment. Our results demonstrate that partially LLM-generated labels can embed systemic errors that distort not only evaluation but also downstream model alignment. By adopting a hybrid oversight system, we can prioritize scarce expert feedback to maintain benchmarks as living, clinically-grounded documents. Ensuring this alignment is a prerequisite for the safe deployment of LLMs in high-stakes medical decision support.

Ähnliche Arbeiten

Autoren

Themen

Machine Learning in HealthcareMachine Learning and Data ClassificationArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen