Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A head-to-head comparison of GPT-based prognostic predictions and oncologists in gastrointestinal cancers.
0
Zitationen
4
Autoren
2026
Jahr
Abstract
808 Background: Patients and families want to know their likely survival times but oncologists have trouble making estimates and communicating this information. Large language models (LLMs) such as ChatGPT may assist with estimating prognosis. Methods: We conducted a retrospective pilot study using clinical data from 22 patients with advanced gastrointestinal malignancies and known survival time. A single progress note from near the time of diagnosis was de-identified and given to a gastrointestinal oncologist and a HIPAA-compliant instance of ChatGPT. Likelihood of survival at 6 months, 1 year, 2 years, and 5 years were estimated and categorized as likely (>75%), possible (25-75%), and unlikely (<25%). Predictions were analyzed in two ways. Primary analysis: predictions were scored as correct/incorrect relative to observed survival, and paired accuracy was compared with McNemar’s exact test at each timepoint. Overall patient-level accuracy was compared with an exact binomial sign test. Exploratory analysis: predictions were categorized into probability bins (<25%, 25–75%, >75%). Calibration was assessed by observed survival within each bin. Results: Among 22 patients, median age was 59 years (range 42–80); 55% were male and 45% female, and 45% were Hispanic. Cancer types were heterogeneous, most commonly hepatocellular carcinoma (23%), gastric (18%) or colorectal adenocarcinoma (18%), and gastrointestinal stromal tumor (9%); 27% were stage IV. In the primary analysis (alive - yes or no at timepoint), ChatGPT and the oncologist achieved similar accuracy: 6 months (16/22 vs 15/22, p=1.0), 1 year (15/22 vs 20/22, p=0.125), 2 years (19/22 vs 19/22, p=1.0), and 5 years (20/22 vs 19/22, p=1.0). Overall, the oncologist outperformed ChatGPT in 8 patients, ChatGPT outperformed in 5, and 9 were tied (p=0.58). In the exploratory bin analysis, both showed identical confident performance (accuracy 91% at 6 months and 1 year, 90% at 2 years, 100% at 5 years). Calibration differed: ChatGPT’s >75% bins consistently corresponded to high observed survival (87–100%), reflecting a tendency toward more optimistic survival estimates, whereas oncologists rarely used high-survival bins and often underestimated survival. Conclusions: Oncologists were numerically more accurate overall, with a signal toward greater accuracy at 1 year, though differences were not statistically significant. ChatGPT achieved comparable performance and demonstrated superior calibration of probability estimates, supporting its potential as a complementary prognostic tool in gastrointestinal oncology.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.245 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.100 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.466 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.429 Zit.