Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluation of the Performance of ChatGPT 4.5 in LI-RADS Categorization and Management Suggestion: Zero-shot versus Few-shot Prompting Method
0
Zitationen
2
Autoren
2025
Jahr
Abstract
Objective: To evaluate whether soft-prompt-based conditioning through “Few-shot” prompting improves the accuracy and clinical utility of ChatGPT 4.5 in classifying hepatic lesions and management recommendations according to the Liver Imaging Reporting and Data System (LI-RADS). Methods: This cross-sectional observational study assessed ChatGPT 4.5 using fifty fictional radiology reports covering eight LI-RADS categories. The reports were evaluated under Zero-shot and “Few-shot” prompting conditions. Two board-certified radiologists independently scored the model’s LI-RADS categories and management suggestions using a binary correct/incorrect system. The model performance was compared to that of a radiologist, and statistical analysis was conducted using McNemar’s test, with p<0.05 considered significant. Results: With zero-shot prompting, ChatGPT 4.5 correctly classified 84% of the LI-RADS categories and 70% of the management suggestions. “Few-shot” prompting improved performance, with 92% correct LI-RADS classification and 84% accurate management recommendations. Although the improvement in categorization was not statistically significant (p=0.125), the enhancement in management suggestions was significant (p=0.016). The radiologist comparator achieved 82% accuracy for the LI-RADS classification and 60% for management suggestions. Notably, ChatGPT 4.5, when supported by “Few-shot” prompting, outperformed the radiologist in recommending appropriate management. Conclusion: “Few-shot” prompting transforms ChatGPT 4.5 from a diagnostic assistant into a powerful tool for clinical decision-making, significantly enhancing its ability to generate patient-centered management recommendations. This study is among the earliest to benchmark ChatGPT 4.5 against a radiologist in LI-RADS-based diagnostic and management tasks, underscoring its potential not only to streamline reporting but also to elevate the quality of patient care. As LLMs continue to evolve, they may become supportive tools in radiology, bridging between image interpretation and clinical decision.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.231 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.084 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.444 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.423 Zit.