Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparative Evaluation of Encoder- and Decoder-Based Models for Actionable Findings in CT Reports
0
Zitationen
5
Autoren
2026
Jahr
Abstract
To automate the notification workflow for actionable computed tomography (CT) reports, we investigated model configurations that maintained detection performance even with a very low positive rate. We assembled a dataset of 1000 head CT reports and compared the performance of an encoder model (ModernBERT-ja) with that of a decoder model (Llama-3-ELYZA-JP-8B). The encoder underwent supervised fine-tuning (SFT) using multiple loss functions, while the decoder was evaluated through few-shot prompting, zero-shot chain-of-thought (CoT), and SFT. We systematically decreased the positive rate in the training data to 14.4%, 10%, 5%, and 2% to assess model robustness. The encoder SFT with class-weighted focal loss achieved an F1 score of 0.870 at a positive rate of 14.4%, surpassing the decoder's best few-shot configuration (10 demonstrations, label0_first; F1 = 0.717). However, encoder performance declined substantially when the positive rate dropped below 5%. The decoder improved with only a few demonstrations, and the ordering of positive demonstrations played a crucial role in few-shot prompting. Applying SFT to the decoder raised the F1 score to 0.853, approaching that of the encoder but incurring higher training expenses. Neither architecture maintained detection capability when the positive rate was 2%. Zero-shot CoT performed less effectively than few-shot prompting under all imbalanced conditions. With sufficient positive cases, encoder SFT yields the best performance. When sampling the minority class is difficult, decoder few-shot prompting without additional training is a practical alternative. These results guide the choice between encoder fine-tuning and decoder prompting for highly imbalanced radiology-report triage of actionable findings.
Ähnliche Arbeiten
Refinement and reassessment of the SERVQUAL scale.
1991 · 3.966 Zit.
Features and uses of high-fidelity medical simulations that lead to effective learning: a BEME systematic review
2005 · 3.779 Zit.
Radiobiology for the Radiologist.
1974 · 3.501 Zit.
International evidence-based recommendations for point-of-care lung ultrasound
2012 · 2.817 Zit.
Radiation Dose Associated With Common Computed Tomography Examinations and the Associated Lifetime Attributable Risk of Cancer
2009 · 2.431 Zit.