OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 13.03.2026, 17:27

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Benchmarking large language models for cardiovascular risk stratification using clinical vignettes

2025·0 ZitationenOpen Access
Volltext beim Verlag öffnen

0

Zitationen

11

Autoren

2025

Jahr

Abstract

<title>Abstract</title> Large language models (LLMs) show promise for cardiovascular risk stratification, though their performance compared with clinical guidelines requires validation. We benchmarked eleven contemporary LLMs using 30 bilingual (Portuguese/English) outpatient vignettes comparing their classifications against expert-adjudicated European Society of Cardiology guidelines using SCORE2. Models achieved near-perfect extraction of traditional risk factors (micro-F1 0.97–0.99) but only moderate agreement for three-class ESC risk categories (best weighted kappa 0.69, 95% CI 0.44–0.84). Ten out of eleven showed systematic underestimation of risk. LLMs struggled with SCORE2 numeric computation, with mean absolute error exceeding 5 percentage points in all but one. Most models correctly identified guideline exceptions requiring alternative assessment, beyond SCORE2, in more than 95% of cases. No significant performance differences between languages were found. While LLMs excel at structured data extraction and eligibility screening, their inconsistent risk stratification and poor numeric accuracy preclude autonomous clinical use, warranting further refinement.

Ähnliche Arbeiten