Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

An Open-Source Large Language Model (LLM) Demonstrates Potential for Labeling Features From Radiology Reports

2025·0 Zitationen·American Journal of Respiratory and Critical Care Medicine

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Abstract RATIONALE: Accurate text interpretation using traditional natural language processing (NLP) approaches has historically been a roadblock for incorporating text-based findings into clinical decision support tools. Advances in NLP, most notably large language models (LLMs), have unlocked the ability to identify clinically relevant results from unstructured text reports. Here we apply the open-source Llama-3 LLM to chest radiograph reports and evaluate the accuracy of Llama-3 at labeling radiology text reports for features relevant to creating pneumonia severity clinical decision support. METHODS: Chest radiograph reports were obtained from patients with an ICD-10 primary diagnosis of pneumonia present on admission (J9-J18) who were admitted to a single center in New York City from January 2017 to September 2019. Llama-3 was run locally using the Ollama python package to evaluate a random selection of 200 radiology reports. The beginning of all prompts included an initial text to the LLM setting the context. These initial instructions were created with trial-and-error tuning. A python script prompted Llama-3 with three questions for each radiology report: “Does the chest x-ray show a consolidation?”; “How many sides of the lungs show evidence of pneumonia?”; and “Does the chest x-ray show evidence of an atypical pneumonia?”; The LLM was instructed to answer either a “Yes” or “No” for the first prompt; “No sides,” “One side,” or “Two sides” for the second prompt; and “Yes,” “No,” or “Maybe” for the third prompt. To evaluate the performance of the LLM, two clinicians reviewed the radiology reports and answered the above questions. Percent agreement and Cohen's Kappa were calculated between the clinicians. Any disagreements were arbitrated by a third clinician, with the resulting clinician labels used as the reference for comparison to Llama-3. RESULTS: The percent agreement and Cohen's Kappa for the initial two clinician reviewers and between Llama-3 and the clinician reference is shown in Table 1. CONCLUSIONS: Clinicians demonstrated moderate to good agreement at labeling features from radiology reports while Llama-3 demonstrated moderate to poor agreement, depending on the feature. Radiology reports have some ambiguity, and even clinicians could not achieve perfect agreement labeling features. In the future, further tuning of the LLM prompt and clarifying the questions may improve performance. Forthcoming new model releases will also likely improve performance. Llama-3 has potential for labeling features from radiology reports for a variety of potential uses, including creating clinical decision support and predictive modeling.

Autoren

Institutionen

Themen

Radiomics and Machine Learning in Medical ImagingArtificial Intelligence in Healthcare and EducationRadiology practices and education

Volltext beim Verlag öffnen

An Open-Source Large Language Model (LLM) Demonstrates Potential for Labeling Features From Radiology Reports

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen