Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
One Sample to Rule Them All: Extreme Data Efficiency in Multidiscipline Reasoning with Reinforcement Learning
0
Zitationen
9
Autoren
2026
Jahr
Abstract
The reasoning ability of large language models (LLMs) can be unleashed with reinforcement learning (RL) (OpenAI, 2024; DeepSeek-AI et al., 2025a; Zeng et al., 2025). The success of existing RL attempts in LLMs usually rely on high-quality samples of large volumes. In this paper, we challenge conventional assumptions about data requirements in RL for LLMs by demonstrating the effectiveness of one-shot reinforcement learning. Specifically, we introduce polymath learning, a framework for designing one training sample that elicits multidisciplinary reasoning improvement. We present three key findings: (1) A single, strategically selected math reasoning sample can produce significant performance improvements across multiple domains, including physics, chemistry, and biology; (2) Analysis of salient mathematical skills provides insight into the characteristics associated with effective polymath samples; and (3) An engineered synthetic sample that integrates multidisciplinary elements and broader skill coverage achieves stronger performance than naturally occurring individual samples. Across various reasoning benchmarks, polymath learning achieves stronger performance than larger datasets, demonstrating that reasoning structure and skills in samples, rather than quantity, may be the key to unlock enhanced reasoning capabilities in language models. Our results suggest a shift, dubbed as sample engineering, toward precision engineering of samples that complements simply increasing data volume.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.593 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.483 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.003 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.824 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.