Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A multitask framework for automated interpretation of multi-frame right upper quadrant ultrasound in clinical decision support
0
Zitationen
29
Autoren
2026
Jahr
Abstract
Ultrasound is a cornerstone of emergency and hepatobiliary imaging, yet its interpretation remains highly operator-dependent and time-sensitive. Here, we present a multitask vision-language agent (VLM) developed to assist with comprehensive right upper quadrant (RUQ) ultrasound interpretation across the full diagnostic workflow. The system was trained on a large, multi-center dataset comprising a primary cohort from Johns Hopkins Medical Institutions (9,189 cases, 594,099 images) and externally validated on cohorts from Stanford University (108 cases, 3,240 images) and a major Chinese medical center (257 cases, 3,178 images). Built on the Qwen2.5-VL-7B architecture, the agent integrates frame-level visual understanding with report-grounded language reasoning to perform three tasks: (i) classification of 18 hepatobiliary and gallbladder conditions, (ii) generation of clinically coherent diagnostic reports, and (iii) surgical decision support based on ultrasound findings and clinical data. The model achieved high diagnostic accuracy across all tasks, generated reports that were indistinguishable from expert-written versions in blinded evaluations, and demonstrated superior factual accuracy and information density on content-based metrics. The agent further identified patients requiring cholecystectomy with high precision, supporting real-time decision-making. These results highlight the potential of generalist vision-language models to improve diagnostic consistency, reporting efficiency, and surgical triage in real-world ultrasound practice.
Ähnliche Arbeiten
MizAR 60 for Mizar 50
2023 · 74.517 Zit.
ImageNet: A large-scale hierarchical image database
2009 · 60.624 Zit.
Microsoft COCO: Common Objects in Context
2014 · 41.271 Zit.
Fully convolutional networks for semantic segmentation
2015 · 36.380 Zit.
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
2017 · 20.463 Zit.
Autoren
- Haiman Guo
- Cheng-Yi Li
- Yuli Wang
- Robin Wang
- Yuwei Dai
- Qinghai Peng
- Danming Cao
- Luoyun Wang
- Thao Vu
- Wu Xing
- Chengzhang Zhu
- Christopher Tan
- Jacob Schick
- Stephen Kwak
- Farzad Sedaghat
- Javad Azadi
- James Facciola
- Jinwen Feng
- Dilek Öncel
- Ulrike M. Hamper
- Alex Zihao Zhu
- Tej Mehta
- Melissa M. Leimkuehler
- Cheng Ting Lin
- Zhicheng Jiao
- Ihab Kamel
- Jing Wu
- Li Yang
- Harrison Bai