Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
MedOmni-45°: A Safety–Performance Benchmark for Reasoning-Oriented LLMs in Medicine
0
Zitationen
6
Autoren
2026
Jahr
Abstract
With the rapid integration of large language models (LLMs) into medical decision-support aids, ensuring reliability in reasoning steps—not just final answers—is increasingly critical. Two key safety dimensions are Chain-of-Thought (CoT) faithfulness, which assesses alignment of the model’s reasoning process with both its response and medical facts, and sycophancy, an emergent misalignment where models follow misleading cues instead of factual correctness. Yet existing benchmarks tend to prioritize performance evaluation, frequently collapsing nuanced safety vulnerabilities into a single accuracy score. To fill this gap, we introduce MedOmni-45°, a benchmark and evaluation workflow explicitly designed to quantify the safety–performance trade-off in LLMs under manipulative hint conditions. The benchmark contains 1,804 reasoning-focused medical questions across six clinical specialties and three task types, including 500 publicly comparable items from MedMCQA. Each question is systematically augmented with seven manipulative hint types, each embedding two distinct misleading cue variants, along with a No-Hint baseline, resulting in approximately 27,000 unique inputs. These inputs are then evaluated across seven LLMs spanning open- and closed-source, general-purpose and medical-specific, and base versus reasoning-enhanced variants, amounting to over 189K total inference instances. Three orthogonal metrics (Accuracy, CoT-Faithfulness, Anti-Sycophancy) are combined into a composite score visualized via a 45° safety–performance plot. Results reveal a universal trade-off, with no model surpassing the ideal diagonal. Open-source QwQ-32B approaches closest at 43.81°, demonstrating notable safety while not surpassing others in performance. MedOmni-45° thus highlights critical vulnerabilities of LLMs in reasoning oriented medical tasks, offering a robust benchmark for future alignment research.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.339 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.211 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.614 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.478 Zit.