Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
RadSearch, a Semantic Search Model for Accurate Radiology Report Retrieval with Large Language Model Integration
3
Zitationen
4
Autoren
2025
Jahr
Abstract
Background Current radiology report search tools are limited to keyword searches, which lack semantic understanding of underlying clinical conditions and are prone to false positives. Semantic search models address this issue, but their development requires scalable methods for generating radiology-specific training data. Purpose To develop a scalable method for training semantic search models for radiology reports and to evaluate a model, RadSearch, trained using this method. Materials and Methods In this retrospective study, a scalable method for generating training examples for semantic search was applied to CT and MRI reports generated between December 2021 and January 2022, and was used to train the model RadSearch. RadSearch performance was evaluated using four internal test sets (including one subset) and one external test set from another large tertiary medical center, including chest, abdomen, and head CT reports generated between December 2015 and June 2023. Performance was evaluated for findings-to-impression matching, retrieving reports with the same examination type, retrieving reports relevant to free-text queries, and improving the ability of a large language model (LLM) (Llama 3.1 8B Instruct) to provide accurate diagnoses from report finding descriptions. RadSearch performance was compared with that of other embedding models specialized for symmetric (All MPNet Base) and asymmetric (MS MARCO DistilBERT Base) semantic search and a state-of-the-art semantic search model (GTE-large). A reference set of 100 diagnoses with common radiologic descriptions was used for the LLM evaluation. Findings-to-impression matching and free-text query accuracy <i>P</i> values were calculated using χ<sup>2</sup> and McNemar tests. Results The training set included 16 690 reports; the internal test sets included 13 598, 6178, and 9954 reports; and the external test set included 13 958 reports. For simulated free-text clinical queries, RadSearch successfully retrieved reports containing the specified findings for 83.0% (498 of 600) of reports and matching location for 89.8% (521 of 580) of reports, outperforming GTE-large, with performance at 65.7% (394 of 600; <i>P</i> < .001) and 58.8% (341 of 580; <i>P</i> < .001), respectively. For 100 report finding descriptions, the baseline accuracy of Llama 3.1 8B Instruct in providing the correct diagnosis without any embedding model search assistance was 30% (30 of 100), improving to 61% (61 of 100) with RadSearch integration (<i>P</i> < .001), which outperformed GTE-large integration (47% [47 of 100]; <i>P</i> = .03). Conclusion A semantic search model trained with scalable methods achieved state-of-the-art performance in retrieving reports with relevant findings and improved LLM diagnostic accuracy. © RSNA, 2025 <i>Supplemental material is available for this article.</i> See also the editorial by Yasaka and Abe in this issue.
Ähnliche Arbeiten
Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study
2020 · 22.607 Zit.
La certeza de lo impredecible: Cultura Educación y Sociedad en tiempos de COVID19
2020 · 19.271 Zit.
A Multi-Modal Distributed Real-Time IoT System for Urban Traffic Control (Invited Paper)
2024 · 14.251 Zit.
UNet++: A Nested U-Net Architecture for Medical Image Segmentation
2018 · 8.491 Zit.
Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
2021 · 7.104 Zit.