Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Fine-Tuned Large Language Models for Automated Radiology Impression Generation: A Multicenter Evaluation
0
Zitationen
20
Autoren
2026
Jahr
Abstract
Purpose To develop a fine-tuned large language model (Medical Imaging Report Assistant, MIRA) and evaluate its performance in generating radiology impressions from multicenter data with respect to accuracy, reporting efficiency, and clinical applicability. Materials and Methods A retrospective multicenter dataset comprising 1.87 million radiology reports (including CT, MRI, and digital radiography data) from 42 hospitals across 22 provinces in China (January 2019 to August 2024) was compiled. The dataset was used to fine-tune an LLM via a prompt-based strategy. The evaluation framework incorporated both automated and human evaluation metrics. Radiologists evaluated internal and external datasets and three open-source datasets to compare impressions generated by the fine-tuned LLM and GPT-4o. Twenty-four radiologists from six centers performed blinded comparisons of MIRA generated and reference impressions to assess interrater consistency and drafting efficiency. Data were analyzed using appropriate parametric/nonparametric tests and χ<sup>2</sup> tests, with Holm-Bonferroni correction for multiple comparisons. Results The internal test set included data for 78,544 reports, median age, 52 years [IQR, 35-65], 39,351 males) and the external test set included data for (27,471 reports, median age, 53 years [IQR, 37-66], 13,955 males). Site/modality-aware prompting improved similarity (<i>P</i> < .001): internal BERTScore-F/Sentence Similarity 0.92/0.92, external 0.82/0.80 under optimal settings; human evaluation (<i>n</i> = 2,327) showed MIRA beat GPT-4o on both similarity and F1 score (<i>P</i> < .001). MIRA-generated impressions were rated as at least as good as the reference impressions in 69.0% of blinded comparisons (1,657/2,400), reduced draft time by 0.46 min per report, and increased interradiologist agreement (<i>P</i> < .001). Conclusion MIRA, a fine-tuned LLM using a prompt-based strategy, generated clinically aligned radiology impressions in multicenter settings, improving accuracy, efficiency, and reporting consistency. © The Authors 2026. Published by the Radiological Society of North America under a CC BY 4.0 license.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.513 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.407 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.882 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.571 Zit.
Autoren
Institutionen
- Jilin University(CN)
- First Hospital of Jilin University(CN)
- Harbin Medical University(CN)
- First Affiliated Hospital of Harbin Medical University(CN)
- Ningbo No. 2 Hospital(CN)
- Ningbo No.6 Hospital(CN)
- Gaochun People's Hospital(CN)
- The Central Hospital of Enshi Tujia and Miao Autonomous Prefecture(CN)
- Jiangsu Province Hospital(CN)
- Fujian Medical University(CN)
- First Affiliated Hospital of Fujian Medical University(CN)
- China Medical University(CN)