Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Consensus, dissensus and synergy between clinicians and specialist foundation models in radiology report generation
3
Zitationen
26
Autoren
2024
Jahr
Abstract
<title>Abstract</title> Radiology reports are an instrumental part of modern medicine, informing key clinical decisions such as diagnosis and treatment. The worldwide shortage of radiologists, however, restricts access to expert care and imposes heavy workloads, contributing to avoidable errors in report delivery. While recent progress in automated report generation with vision-language models offers clear potential to ameliorate this situation, the path toward real-world adoption has been stymied by the challenge of evaluating the clinical quality of AI-generated reports. In this study, we build a state-of-the-art report generation system for chest radiographs, <italic>Flamingo-CXR</italic>, by fine-tuning a well-known vision-language foundation model on radiology data. To measure the quality of the AI-generated reports, we perform an expert evaluation, that is largest in scale and diversity to date, by engaging a group of 27 certified radiologists in the United States and India to provide detailed assessment of AI-generated and human written reports from an intensive care setting as well as an inpatient setting. We observe a wide distribution of preferences across the panel, ranging from full consensus to dissensus, across clinical settings and regions, with 55.4% of Flamingo-CXR intensive care reports evaluated to be preferable or equivalent to clinician reports, by half or more of the panel, rising to 77.7% for outpatient x-rays overall and to 94% for the subset of cases with no pertinent abnormal findings. For reports that contain errors we develop an assistive setting, the first demonstration of clinician-AI collaboration for radiology report composition, and we observe a synergistic improvement across all clinical settings. Altogether, these nuanced evaluations reveal disparities between the AI system and radiologists, identify areas for potential clinical utility and pave the way toward a collaborative system that enhances clinical accuracy of radiology reporting.
Ähnliche Arbeiten
Refinement and reassessment of the SERVQUAL scale.
1991 · 3.966 Zit.
Features and uses of high-fidelity medical simulations that lead to effective learning: a BEME systematic review
2005 · 3.754 Zit.
Radiobiology for the Radiologist.
1974 · 3.501 Zit.
International evidence-based recommendations for point-of-care lung ultrasound
2012 · 2.806 Zit.
Radiation Dose Associated With Common Computed Tomography Examinations and the Associated Lifetime Attributable Risk of Cancer
2009 · 2.426 Zit.
Autoren
- Ryutaro Tanno
- David Barrett
- Andrew Sellergren
- Sumedh Ghaisas
- Sumanth Dathathri
- Abigail See
- Johannes Welbl
- K. K. Singhal
- Shekoofeh Azizi
- Tao Tu
- Mike Schaekermann
- R. May
- Roy Lee
- SiWai Man
- Zahra S. Ahmed
- S. Sara Mahdavi
- Yossi Matias
- Joëlle Barral
- Ali Eslami
- Danielle Belgrave
- Vivek Natarajan
- Shravya Shetty
- Pushmeet Kohli
- Po-Sen Huang
- Alan Karthikesalingam
- Sofia Ira Ktena