Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models

2025·0 Zitationen·ArXiv.orgOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Although Large Language Models (LLMs) exhibit advanced reasoning ability, conventional alignment remains largely dominated by outcome reward models (ORMs) that judge only final answers. Process Reward Models(PRMs) address this gap by evaluating and guiding reasoning at the step or trajectory level. This survey provides a systematic overview of PRMs through the full loop: how to generate process data, build PRMs, and use PRMs for test-time scaling and reinforcement learning. We summarize applications across math, code, text, multimodal reasoning, robotics, and agents, and review emerging benchmarks. Our goal is to clarify design spaces, reveal open challenges, and guide future research toward fine-grained, robust reasoning alignment.

Autoren

Themen

Topic ModelingArtificial Intelligence in Healthcare and EducationMultimodal Machine Learning Applications

Volltext beim Verlag öffnen

A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models

Abstract

Ähnliche Arbeiten

Autoren

Themen