OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 08.05.2026, 12:20

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Few-shot prompting strategies for improving large language model-based cardiovascular disease risk prediction

2026·0 Zitationen·ArrayOpen Access
Volltext beim Verlag öffnen

0

Zitationen

7

Autoren

2026

Jahr

Abstract

Accurate prediction of cardiovascular disease (CVD) risk enables earlier prevention and better clinical decisions. Conventional models such as the Framingham Risk Score (FRS) and Atherosclerotic Cardiovascular Disease (ASCVD) equations may generalize poorly across diverse populations and incomplete electronic health records (EHRs). In this paper, we present a prompting-based alternative that uses few-shot in-context learning to guide large language models (LLMs) in estimating 10-year CVD risk without retraining, offering a data-efficient and privacy-conscious alternative to fine-tuned medical LLM pipelines. Using 352 de-identified MIMIC-III/IV records, we evaluate GPT-4.1, GPT-4o, and Qwen3-4B against FRS and ASCVD outputs under zero-shot and few-shot prompting, random versus similarity-based exemplar selection, and with or without chain-of-thought reasoning. Few-shot prompting substantially improves calculator alignment for GPT-4.1 and GPT-4o, whereas Qwen3-4B shows weaker gains. With 40 examples and reasoning enabled, GPT-4.1 achieves AUPRC 0.951, mean absolute error about 7, root mean squared error about 9, and F1-score 0.85, while GPT-4o performs comparably. Within the white-cohort similarity analysis, five similarity-selected exemplars match or outperform 20 randomly selected examples across error and discrimination metrics, showing that exemplar quality can outweigh quantity under tight context budgets. Overall, these findings indicate that few-shot prompting can closely approximate validated clinical calculators in data-limited settings and can be adapted across institutions and patient populations through exemplar selection rather than retraining. However, clinical utility remains bounded by the strengths and weaknesses of the underlying calculators, and we do not evaluate prediction of observed cardiovascular events.

Ähnliche Arbeiten