Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Deep Reinforcement Learning-Driven Adaptive Prompting for Robust Medical LLM Evaluation

2026·0 Zitationen·Applied SciencesOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

The accurate and reliable evaluation of large language models (LLMs) in medical domains is critical for real-world clinical deployment, automated medical reasoning, and patient safety. However, the evaluation process is highly sensitive to prompt design, and prevalent reliance on fixed or randomly sampled prompt policies often fails to dynamically adapt to clinical context, question complexity, or evolving safety requirements. This article presents a novel reinforcement learning-based framework for multi-prompt selection, which dynamically optimizes prompt policy per input for medical LLM evaluation across the Medical Knowledge Question-Answering dataset (MKQA), the Medical Multiple-Choice Question dataset (MCQ), and the Doctor-Patient Dialogue dataset. We formulate prompt selection as a Markov Decision Process (MDP) and employ a deep Q-Network (DQN) agent to maximize a reward signal incorporating textual accuracy, domain terminology coverage, safety, and dialogue relevance. Experiments on three medical LLM benchmarks demonstrate consistent improvements in composite reward (e.g., a 6.66% increase in MKQA vs. Random Baseline, and a 2.41% increase in Dialogue vs. Fixed Baseline) when compared to baselines. This was accompanied by robust enhancements in Safety (e.g., achieving 1.0000 in MKQA, a 5.26% increase vs. Fixed Baseline; and a 5.03% increase in Dialogue vs. Fixed Baseline) and substantial gains in Medical Terminology Coverage (e.g., a 74.61% increase in MKQA vs. Fixed Baseline, and a 9.13% increase in MCQ vs. Fixed Baseline) when compared to baselines. While varying across tasks, an improvement in accuracy was observed in the MKQA task, and the framework effectively optimizes the multi-objective reward function, even when minor trade-offs in other metrics like Accuracy and Contextual Relevance were observed in some contexts. Our framework enables robust, context-aware, and adaptive evaluation, laying a foundation for safer and more reliable LLM application in healthcare.

Autoren

Institutionen

Themen

Machine Learning in HealthcareTopic ModelingArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Deep Reinforcement Learning-Driven Adaptive Prompting for Robust Medical LLM Evaluation

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen