Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Large Language Model-Generated Expansion of the RadLex Ontology: Application to Multinational Datasets of Chest CT Reports
0
Zitationen
8
Autoren
2026
Jahr
Abstract
<b>Background:</b> RadLex is a widely used radiology-specific ontology that standardizes terminology for clinical and research uses. However, the ontology's coverage of clinical radiology reports remains limited due to radiologists' linguistic variation. <b>Objective:</b> To use a large language model (LLM) to generate an expanded set of lexical variants and synonyms for the RadLex ontology and to evaluate the impact of this expansion on lexical coverage and semantic term recognition using clinical radiology reports. <b>Methods:</b> This retrospective study used an LLM (Gemini 2.0 Flash Thinking) to generate an expansion (lexical variants [morphologic variants, orthographic variants, and acronyms and abbreviations] and strict semantic synonyms) of the 40,000 RadLex preferred terms, with detailed constraints to ensure semantic alignment. Five datasets of clinical chest CT reports were obtained (two from the study institution in Korea [n=119,098 and n=245], three from public datasets [Spain, n=5213; Turkey, n=21,304; United States, n=19,405]). The same LLM was used to parse the reports into lexicon units (concise text strings representing distinct medical concepts). For each dataset, the lexical coverage rate was automatically computed as a measure of the extent to which the parsed units matched a given expression list. Additionally, 100 randomly selected reports from each dataset were manually reviewed to determine a given expression list's precision, recall, and F1 score (measures of unit-level matching performance when requiring semantic fidelity). Metrics were compared between the existing RadLex-provided expansion and the LLM-generated expansion. <b>Results:</b> The RadLex-provided expansion contained 17,515 terms. The LLM-generated expansion contained 208,465 lexical variants and 69,918 synonyms. For all five datasets, the LLM-generated expansion, compared with the RadLex-provided expansion, had a greater lexical coverage rate (81.9-85.6% vs 67.5-75.3%), greater recall (81.6-91.4% vs 64.0-80.3%), lower precision (94.8-98.2% vs 100.0% [all datasets]), and greater F1 score (0.91-0.95 vs 0.86-0.91). <b>Conclusion:</b> Across multinational datasets of clinical chest CT reports, the LLM-generated term expansion yielded improved lexical coverage and semantic recall, with only small loss of semantic precision, compared with the RadLex-provided expansion. <b>Clinical Impact:</b> The LLM-based approach provides a practical and scalable solution for expanding radiology ontologies while maintaining semantic alignment; the method can aid real-world natural language processing applications.
Ähnliche Arbeiten
Research electronic data capture (REDCap)—A metadata-driven methodology and workflow process for providing translational research informatics support
2008 · 49.843 Zit.
Gene Ontology: tool for the unification of biology
2000 · 43.864 Zit.
STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets
2018 · 18.783 Zit.
A translation approach to portable ontology specifications
1993 · 12.445 Zit.
Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research
2005 · 11.966 Zit.