Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluation of the Performance of ChatGPT 4.5 in LI-RADS Categorization and Management Suggestion: Zero-shot versus Few-shot Prompting Method

2025·0 Zitationen·European Journal of TherapeuticsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Objective: To evaluate whether soft-prompt-based conditioning through “Few-shot” prompting improves the accuracy and clinical utility of ChatGPT 4.5 in classifying hepatic lesions and management recommendations according to the Liver Imaging Reporting and Data System (LI-RADS). Methods: This cross-sectional observational study assessed ChatGPT 4.5 using fifty fictional radiology reports covering eight LI-RADS categories. The reports were evaluated under Zero-shot and “Few-shot” prompting conditions. Two board-certified radiologists independently scored the model’s LI-RADS categories and management suggestions using a binary correct/incorrect system. The model performance was compared to that of a radiologist, and statistical analysis was conducted using McNemar’s test, with p<0.05 considered significant. Results: With zero-shot prompting, ChatGPT 4.5 correctly classified 84% of the LI-RADS categories and 70% of the management suggestions. “Few-shot” prompting improved performance, with 92% correct LI-RADS classification and 84% accurate management recommendations. Although the improvement in categorization was not statistically significant (p=0.125), the enhancement in management suggestions was significant (p=0.016). The radiologist comparator achieved 82% accuracy for the LI-RADS classification and 60% for management suggestions. Notably, ChatGPT 4.5, when supported by “Few-shot” prompting, outperformed the radiologist in recommending appropriate management. Conclusion: “Few-shot” prompting transforms ChatGPT 4.5 from a diagnostic assistant into a powerful tool for clinical decision-making, significantly enhancing its ability to generate patient-centered management recommendations. This study is among the earliest to benchmark ChatGPT 4.5 against a radiologist in LI-RADS-based diagnostic and management tasks, underscoring its potential not only to streamline reporting but also to elevate the quality of patient care. As LLMs continue to evolve, they may become supportive tools in radiology, bridging between image interpretation and clinical decision.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationRadiomics and Machine Learning in Medical ImagingRadiology practices and education

Volltext beim Verlag öffnen

Evaluation of the Performance of ChatGPT 4.5 in LI-RADS Categorization and Management Suggestion: Zero-shot versus Few-shot Prompting Method

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen