Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Who Matters More in Radiology Report Generation: Vision Encoders or Language Models?

2025·0 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

The rapid development of Multimodal Large Language Models (MLLMs) has advanced Radiology Report Generation (RRG). While much of this progress is driven by increasingly powerful Large Language Models (LLMs), the roles of both the vision encoder and the LLM remain underexplored, especially in domain-specific contexts. In this work, we systematically study how different vision encoders and LLMs affect RRG performance, analyzing the task from both vision- and languagecentric perspectives. Through extensive evaluation, we show that domain-adapted vision encoders and LLMs significantly enhance the quality and clinical relevance of generated reports. These findings offer practical guidance for building effective MLLMs in medical imaging.

Autoren

Institutionen

Themen

Radiology practices and educationMultimodal Machine Learning ApplicationsArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Who Matters More in Radiology Report Generation: Vision Encoders or Language Models?

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen