OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 19.05.2026, 03:14

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A systematic evaluation of open-source large language models for automated extraction of cardiac MRI parameters from unstructured reports

2025·0 Zitationen·European Heart Journal
Volltext beim Verlag öffnen

0

Zitationen

8

Autoren

2025

Jahr

Abstract

Abstract Background Cardiac magnetic resonance imaging (CMR) parameters are often stored in heterogeneous, unstructured clinical reports. Manual adjudication of these parameters is time-consuming and can require domain expertise. Recent open-source large language models (LLMs) have demonstrated impressive performance on language-task benchmarks. Moreover, they are cost-effective and readily adaptable to data privacy requirements and custom use-cases. To date, no prior study has systematically evaluated the performance of state-of-the-art (SOTA) open-source LLMs in extracting cardiac parameters from real-world CMR reports. Purpose To investigate and compare the ability of several SOTA open-source LLMs to automatically and accurately extract key cardiac parameters from unstructured CMR reports. Methods We retrospectively collected 1108 CMR reports from a single academic institution. Seven open-source LLMs varying in parameter size (2 to 9 billion) and pretraining corpus (general vs biomedical text) were evaluated, including Gemma 2-2B, Gemma 2-9B, Llama 3.2-3B, Llama 3.1-8B, Qwen 2.5-7B, BioMistral-7B, and Meditron 3-8B. We chose LLMs with fewer than 10B parameters to align with the computational capacity of commonly used GPUs in research settings. Each model was prompted to extract the following CMR parameters: cardiac output (CO), cardiac index (CI), left and right ventricular (LV/RV) ejection fraction (LVEF, RVEF), LV/RV end-systolic volume index (LVESVI, RVESVI), LV/RV stroke volume index (LVSVI, RVSVI), LV late gadolinium enhancement (LV LGE), LV LGE type (ischemic, non-ischemic, mixed), and T2 positivity. Results were pooled across three runs to account for LLM stochasticity. Concordance between model outputs and human expert-adjudicated values were computed. Results Gemma 2-9B achieved the highest concordance with human annotation on 6 out of the 11 CMR parameters, including CI (100%), CO (99%), LVEF (97%), RVEF (99%), LVSVI (97%), and RVSVI (99%), Figure 1A. Qwen 2.5-7B performed the best for LVESVI (98%), RVESVI (99%), and T2 positivity (95%). T2 positivity exhibited the highest average concordance across models (92.4%), while LV LGE type had the lowest (76.3%). Larger-parameter models (Gemma 2-9B, Llama 3.1-8B) consistently outperformed their smaller-parameter counterparts (Gemma 2-2B, Llama 3.2-3B), Figure 1A. Surprisingly, the two medical-domain models, BioMistral-7B and Meditron 3-8B, generally exhibited lower performance compared to the non-medical LLMs. Indeed, BioMistral-7B was the worst-performing LLM for all 11 CMR parameters, Figure 1B. This suggests that medical (domain-specific) pretraining may negatively affect adjudication performance. Conclusion Open-source LLMs show promise for automated and accurate extraction of cardiac parameters from unstructured CMR reports. Larger-parameter, general-pretrained LLMs—rather than LLMs trained on biomedical data—provide more accurate adjudications.Fig 1.LLM CMR adjudication performance

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationCardiac Imaging and DiagnosticsMachine Learning in Healthcare
Volltext beim Verlag öffnen