Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Utilizing ChatGPT for assessing disease volume in patients (pts) with metastatic prostate cancer (mPC).
0
Zitationen
15
Autoren
2025
Jahr
Abstract
42 Background: Artificial intelligence (AI) has transformed many aspects of healthcare, particularly in medical imaging analysis. In mPC, accurate identification of bone metastases is crucial in guiding pt management. This proof-of-concept study explores the application of ChatGPT-4o, a multimodal, generative pre-trained transformer (large language model) with emerging image recognition capabilities, in diagnosing bone metastases from bone scans and determining disease volume in pts with mPC. Methods: In this IRB-approved, retrospective study, pts with newly diagnosed mPC and bone-predominant disease were randomly selected. Those who had previously received bone-protecting agents, chemotherapy, hormone therapy, or radiation therapy prior to the bone scan were excluded. Using CHAARTED criteria, high-volume versus low-volume bone disease was classified based on radiologist reports. The bone scan images were then uploaded to ChatGPT, which was instructed to act as a radiologist and asked to classify the disease volume. ChatGPT's classifications were compared to the radiologists’ findings by using Cohen’s Kappa test. Confusion’s matrix was used to represent the sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) of ChatGPT. Results: A total of 110 pts were selected, with a median age of 67 years (range: 44 to 89). The majority were Caucasian (98.2%). The median Gleason score was 9, and 77.3% of pts had de novo metastatic disease. The median PSA value at diagnosis was 35.2 ng/ml (range: 1-5843). High-volume disease was present in 52% of pts. ChatGPT's overall concordance with radiologists was 74% (Cohen's Kappa = 0.65). For high-volume disease, ChatGPT demonstrated a high sensitivity of 92.3% (48/52). However, it misclassified 45.8% of low-volume cases as high-volume (specificity: 54.2%). The PPV and NPV of ChatGPT were 68.6% and 86.7%, respectively. Interestingly, ChatGPT’s accuracy was significantly higher when each case was analyzed individually (77.7%) compared to when cases were grouped in a single conversation (43.5%, p<0.001). Conclusions: This hypothesis-driven research demonstrated that ChatGPT-4o has the potential to read medical images independently and apply existing knowledge, such as the CHAARTED criteria, to assist in content generation. However, its reliability remains a significant challenge for broader use in healthcare. Notably, it exhibited "information fatigue," where accuracy significantly declined when similar or repetitive information accumulated within a single conversation.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.245 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.100 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.466 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.429 Zit.