Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
An Open-source Fine-tuned Large Language Model for Radiological Impression Generation: A Multi-reader Performance Study
1
Zitationen
10
Autoren
2024
Jahr
Abstract
<title>Abstract</title> Background The impression section integrates key findings of a radiology report but can be subjective and variable. A fine-tuned open-source Large Language Model (LLM) was evaluated in its ability to generate radiological report impressions across different imaging modalities and hospitals. We sought to clinically validate an open-source fine-tuned LLM that automatically generates impressions to summarize radiology reports. Methods In this institutional review board-approved retrospective study, we fine-tuned an open-source LLM to generate the impression from the remainder of the radiology report. CT, US, and MRI radiology reports from Hospital 1 (n = 372716) and Hospital 2 (n = 60049), both under a single institution, were included in this study. The ROUGE score was used for automatic natural language evaluation and a reader study with five thoracic radiologists was performed for a clinical evaluation of CT chest impressions with a subspecialist baseline. We also stratified the results of the reader performance study based on the diagnosis category and the original impression length to gauge case complexity. Results The large language model achieved ROUGE-L scores of 46.51, 44.2, and 50.96 on the Hospital 1 dataset across the CT, US, and MRI modalities respectively. Upon external validation on the Hospital 2 independent test dataset, the model achieved ROUGE-L scores of 40.74, 37.89, and 24.61 for the same set of modalities. For the reader performance study, the model achieved overall mean scores of 3.56/4, 3.92/4, and 3.37/4, 18.29 seconds, and 12.32 words for clinical accuracy, grammatical accuracy, stylistic quality, edit time, and edit distance respectively. The LLM achieved the highest clinical accuracy ratings for acute/emergent findings. In terms of impression length, the LLM performed the best in clinical accuracy on shorter impressions. Conclusions We demonstrated that an open-source fine-tuned LLM can generate high-quality radiological impressions of clinical accuracy, grammatical accuracy, and stylistic quality across multiple imaging modalities and hospitals.
Ähnliche Arbeiten
Refinement and reassessment of the SERVQUAL scale.
1991 · 3.966 Zit.
Features and uses of high-fidelity medical simulations that lead to effective learning: a BEME systematic review
2005 · 3.755 Zit.
Radiobiology for the Radiologist.
1974 · 3.501 Zit.
International evidence-based recommendations for point-of-care lung ultrasound
2012 · 2.807 Zit.
Radiation Dose Associated With Common Computed Tomography Examinations and the Associated Lifetime Attributable Risk of Cancer
2009 · 2.426 Zit.