OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 19.03.2026, 18:21

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Development of an LLM Pipeline Surpassing Physicians in Cardiovascular Risk Score Calculation

2025·2 ZitationenOpen Access
Volltext beim Verlag öffnen

2

Zitationen

10

Autoren

2025

Jahr

Abstract

Abstract Background Risk scores are essential to evidence-based cardiovascular care, but manual calculation is labor-intensive and error-prone. Large language models (LLMs) could automate this process, yet LLMs are limited by their propensity for calculation errors and factual hallucinations. Pipelines that separate LLM-based data extraction from deterministic score computation may improve reliability and transparency. Methods We conducted a retrospective diagnostic study at a quaternary heart center in Germany (January 2020 – July 2023). Patients with atrial fibrillation (n=179) from an ablation registry and patients with severe aortic stenosis (n=76) evaluated by a heart team were included. Five LLMs (DeepSeek-R1, Qwen3, GPT-4 Turbo, Llama 3.1, and PaLM 2) were tested in standalone and pipeline configurations to compute HAS-BLED, CHA₂DS₂-VASc, and EuroSCORE II scores from routine clinical reports. Accuracy was assessed by comparing predictions to expert-adjudicated ground truth, using root mean squared error (RMSE), Krippendorff’s alpha for categorical agreement, and calibration analysis. Results Pipeline-generated scores showed substantially higher agreement with expert adjudication than standalone LLMs and treating clinicians (mean Krippendorff’s alpha: 0.79 vs 0.32 vs 0.31) and demonstrated superior calibration. The Qwen3-based pipeline, achieved the highest accuracy with lower RMSEs than clinicians for HAS-BLED (0.20 vs 0.87), CHA₂DS₂-VASc (0.53 vs 1.08), and EuroSCORE II (1.99 vs 2.05). Conclusion LLM-based pipelines enable accurate, well-calibrated, and scalable cardiovascular risk score computation from unstructured real-world clinical data, outperforming clinicians and standalone LLMs with the potential to reduce clinician workload and support evidence-based care.

Ähnliche Arbeiten