OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 09.04.2026, 11:27

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

TOOL-CURE: Tool Selection via Curriculum-Enhanced Reinforcement Learning with Sample Screening for LLMs

2026·0 ZitationenOpen Access
Volltext beim Verlag öffnen

0

Zitationen

6

Autoren

2026

Jahr

Abstract

Large language models (LLMs) are increasingly deployed as intelligent agents capable of executing complex real-world tasks through external tool interactions, but effective tool selection remains challenging due to the inherent limitations of real-world training data. These datasets suffer from severe tool imbalance following long-tail distributions, data scarcity for specialized tools, logic conflicts between user queries and available tools, rapidly evolving toolsets, and the presence of subpar samples including partially correct and dirty examples. Existing supervised fine-tuning (SFT) approaches struggle with these multifaceted challenges as they require abundant high-quality data, treat all labeled examples as ground truth regardless of quality, and lack the flexibility to generalize beyond specific query-tool pairings seen during training. While reinforcement learning (RL) offers a promising alternative through outcome-based learning, vanilla approaches like Group Relative Policy Optimization (GRPO) suffer from training instability due to conflicting reward signals and inefficient learning from weak signals. To address these issues, we propose TOOL-CURE, a novel method with two key improvements to GRPO: Proficiency-Scaled Curriculum Learning (PSCL), which organizes training into a two-stage curriculum that builds foundational skills on easier samples before progressing to harder ones, and Online Policy Guarding via Sample Screening (OPGSS), which continuously assesses rollout quality and masks dirty samples to prevent noisy gradients from destabilizing policy updates. Our approach enables stable and efficient learning from heterogeneous real-world data, resulting in a robust tool-selection agent that demonstrates significant improvements in accuracy and generalization capability. Our code is available at: https://github.com/einnullnull/TOOL-CURE.git.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationTopic ModelingDomain Adaptation and Few-Shot Learning
Volltext beim Verlag öffnen