Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Baichuan-M2: Scaling Medical Capability with Large Verifier System
1
Zitationen
34
Autoren
2025
Jahr
Abstract
As large language models (LLMs) advance in conversational and reasoning capabilities, their practical application in healthcare has become a critical research focus. However, there is a notable gap between the performance of medical LLMs on static benchmarks such as USMLE and their utility in real-world clinical decision-making. This discrepancy arises because traditional exams fail to capture the dynamic, interactive nature of medical consultations. To address this challenge, we introduce a novel dynamic verification framework that moves beyond static answer verifier, establishing a large-scale, high-fidelity interactive reinforcement learning system. Our framework comprises two key components: a Patient Simulator that creates realistic clinical environments using de-identified medical records, and a Clinical Rubrics Generator that dynamically produces multi-dimensional evaluation metrics. Building on this foundation, we develop Baichuan-M2, a 32B-parameter medical augmented reasoning model trained through a multi-stage reinforcement learning strategy with an improved Group Relative Policy Optimization (GRPO) algorithm. Evaluated on HealthBench, Baichuan-M2 outperforms all other open-source models and most advanced closed-source counterparts, achieving a score above 32 on the challenging HealthBench Hard benchmark-previously exceeded only by GPT-5. Our work demonstrates that robust dynamic verifier system is essential for aligning LLM capabilities with practical clinical applications, establishing a new Pareto front in the performance-parameter trade-off for medical AI deployment.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.259 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.629 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.143 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.535 Zit.
Artificial intelligence in healthcare: past, present and future
2017 · 4.396 Zit.
Autoren
- M. Team
- Chunmeng Dou
- Chong Liu
- Fan YANG
- Fang Li
- J. Jia
- Mingyang Chen
- Qiang Ju
- Shuai Wang
- Shunya Dang
- Tianpeng Li
- Xiangrong Zeng
- Yijie Zhou
- Chenzheng Zhu
- Da Pan
- Fei Deng
- Guangwei Ai
- Guosheng Dong
- Hongda Zhang
- Jinyang Tai
- Jian-Chen Hong
- Kai Lü
- Lei Sun
- Pingxing Guo
- Qian Ma
- Rui Xin
- Shihui Yang
- Shusen Zhang
- Y. L. Mo
- Zheng Liang
- Zhe Zhang
- Hongxia Cui
- Zheng Zhu
- Xiaochuan Wang