Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance of Large Language Models for Radiology Report Impression Generation: A Systematic Review
0
Zitationen
3
Autoren
2026
Jahr
Abstract
No systematic review has previously examined the application of large language models (LLMs) for generating impressions from radiology report findings. This study systematically reviews the performance of LLMs on this task and their associated evaluation methodologies. A search of seven electronic databases on 7 August 2025 identified 15 eligible papers (average quality score: 71.4%). These articles evaluated 35 LLMs, including 21 base models. The reported performance ranges were as follows: Recall-Oriented Understudy for Gisting Evaluation (ROUGE)-1, 35.9% (Generative Pre-Trained Transformer (GPT)-4) to 69.7% (Baichuan2-13B); ROUGE-2, 13.4% (Large Language Model Meta AI (Llama)) to 52.4% (Baichuan2-13B); and ROUGE-L, 16.5% (Chat General Language Model–Medical (ChatGLM-Med)) to 63.8% (finetuned Text-to-Text Transfer Transformer (T5)). The finetuned T5 consistently demonstrated high performance, based on Bidirectional Encoder Representations from Transformers Score (BERTScore): 89.2%; BiLingual Evaluation Understudy (BLEU)-1: 65.2%; BLEU-2: 57.9%; BLEU-3: 52.5%; BLEU-4: 48.3%; Metric for Evaluation of Translation with Explicit ORdering (METEOR): 38.1%; ROUGE-1: 59.9%; ROUGE-2: 50.9%; ROUGE-L: 63.8%; and subjective metrics (clinical usability: 4.5/5.0; completeness: 4.3/5.0; conciseness: 4.3/5.0; fluency: 4.4/5.0). These results, based on 132,043 computed tomography, echocardiography, magnetic resonance imaging, and X-ray reports, indicate its strong clinical potential for assisting radiologists in impression generation through supervised finetuning rather than prompting techniques used in closed-source LLMs.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.635 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.543 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.051 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.844 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.