Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
MAPLE: Elevating Medical Reasoning from Statistical Consensus to Process-Led Alignment
0
Zitationen
10
Autoren
2026
Jahr
Abstract
Recent advances in medical large language models have explored Test-Time Reinforcement Learning (TTRL) to enhance reasoning. However, standard TTRL often relies on majority voting (MV) as a heuristic supervision signal, which can be unreliable in complex medical scenarios where the most frequent reasoning path is not necessarily the clinically correct one. In this work, we propose a novel and unified training paradigm that integrates medical process reward models with TTRL to bridge the gap between test-time scaling (TTS) and parametric model optimization. Specifically, we advance the TTRL framework by replacing the conventional MV with a fine-grained, expert-aligned supervision paradigm using Med-RPM. This integration ensures that reinforcement learning is guided by medical correctness rather than mere consensus, effectively distilling search-based intelligence into the model's parametric memory. Extensive evaluations on four different benchmarks have demonstrated that our developed method consistently and significantly outperforms current TTRL and standalone PRM selection. Our findings establish that transitioning from stochastic heuristics to structured, step-wise rewards is essential for developing reliable and scalable medical AI systems
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.528 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.815 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.377 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.835 Zit.
Artificial intelligence in healthcare: past, present and future
2017 · 4.472 Zit.