Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Ran Score: a LLM-based Evaluation Score for Radiology Report Generation
0
Zitationen
16
Autoren
2026
Jahr
Abstract
Chest X-ray report generation and automated evaluation are limited by poor recognition of low-prevalence abnormalities and inadequate handling of clinically important language, including negation and ambiguity. We develop a clinician-guided framework combining human expertise and large language models for multi-label finding extraction from free-text chest X-ray reports and use it to define Ran Score, a finding-level metric for report evaluation. Using three non-overlapping MIMIC-CXR-EN cohorts from a public chest X-ray dataset and an independent ChestX-CN validation cohort, we optimize prompts, establish radiologist-derived reference labels and evaluate report generation models. The optimized framework improves the macro-averaged score from 0.753 to 0.956 on the MIMIC-CXR-EN development cohort, exceeds the CheXbert benchmark by 15.7 percentage points on directly comparable labels, and shows robust generalization on the ChestX-CN validation cohort. Here we show that clinician-guided prompt optimization improves agreement with a radiologist-derived reference standard and that Ran Score enables finding-level evaluation of report fidelity, particularly for low-prevalence abnormalities.
Ähnliche Arbeiten
La certeza de lo impredecible: Cultura Educación y Sociedad en tiempos de COVID19
2020 · 19.284 Zit.
A Multi-Modal Distributed Real-Time IoT System for Urban Traffic Control (Invited Paper)
2024 · 14.297 Zit.
UNet++: A Nested U-Net Architecture for Medical Image Segmentation
2018 · 8.772 Zit.
Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
2021 · 7.387 Zit.
scikit-image: image processing in Python
2014 · 6.823 Zit.