Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
RadFig-VQA: A Multi-imaging-Modality Radiology Benchmark for Evaluating Vision-Language Models in Clinical Practice
0
Zitationen
7
Autoren
2025
Jahr
Abstract
Current radiology benchmarks predominantly feature mixed general medical images rather than specialized radiological content, limiting comprehensive evaluation of vision-language models in clinical practice. We introduce RadFig-VQA, a comprehensive radiology-specific benchmark comprising 70,550 radiological images and 238,294 question-answer pairs systematically extracted from PubMed Central open-access papers. Our approach employs automated figure classification and multi-source question generation that extracts relevant descriptions from full paper content beyond figure captions, enabling diverse questions across multiple imaging modalities (CT, MRI, X-ray, Ultrasound, PET, SPECT, Mammography, Angiography) and clinical categories. Our fine-tuned Qwen2.5-VL 3B model achieves state-of-the-art performance with 85.4% overall accuracy, representing a significant advancement toward more realistic evaluation of vision-language models in radiology that better reflects real-world clinical complexity. The dataset is available at https://huggingface.co/datasets/YYama0/RadFig-VQA .
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.336 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.207 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.607 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.476 Zit.