OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 15.05.2026, 15:48

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Prompting is All You Need: How to Make LLMs More Helpful for Clinical Decision Support

2026·1 Zitationen·medRxivOpen Access
Volltext beim Verlag öffnen

1

Zitationen

2

Autoren

2026

Jahr

Abstract

IMPORTANCE: Large language models (LLMs) offer potential decision support, but their accuracy varies. Prompt engineering can generally enhance LLM behavior in a clinical context, yet best practices have yet to be formally explored in realistic clinical contexts for neurology. OBJECTIVE: To evaluate the impact of structured prompting versus naive prompting on the performance of four LLMs (two closed-source: OpenAI GPT-4o, OpenAI o3; three open-source: Meta Llama-4-Scout-17B-16E-Instruct, Llama-3.3-70B-Instruct-Turbo, and the reasoning model r1-1776) for thrombolytic clinical decision support (CDS) in acute stroke. DESIGN: Models responded to three novel ischemic stroke vignettes using either a naive question ("Should this patient be offered thrombolytics?") or a five-step structured prompt (CARDS) guiding information extraction, timing analysis, contraindication checking, decision process explanation, and risk-benefit discussion. Outputs were assessed across seven domains: guideline adherence, unsafe recommendations, risk recognition, guideline grading accuracy, inclusion of conversational explanation, clarity, and overall helpfulness. RESULTS: Structured prompts significantly enhanced performance across most domains, with varying effects between model families. For closed-source models (GPT-4o, o3), prompts structured in the CARDS style improved guideline adherence from 83.3% to 100%, eliminated unsafe recommendations (16.7% to 0%), and increased specific guideline grading accuracy from 0% to 100%. Similarly, the open-source reasoning model r1-1776 achieved these top-tier outcomes (100% adherence, 0% unsafe, 100% grading, 100% conversation) when structured prompts were applied, with grading and conversation improving from 0%. In contrast, other open-source models (Llama-4-Scout, Llama-3.3-70B) showed more modest gains: risk recognition improved (83.3% to 100%) and guideline grading accuracy increased (0% to 66.7%), while guideline adherence (66.7%) and unsafe recommendations (33.3%) persisted. Overall, structured prompting yielded the largest improvements in guideline grading accuracy and conversational reasoning across multiple models. CONCLUSION AND RELEVANCE: Structured prompting substantially enhances LLM performance for acute stroke thrombolysis CDS. Notably, some models, including the proprietary GPT-4o and o3, and the open-source reasoning model r1-1776, achieved excellent safety and adherence with structured prompts. For clinical deployment of any LLM, structured prompts are crucial, and vigilant human oversight remains essential.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMachine Learning in HealthcareTopic Modeling
Volltext beim Verlag öffnen