OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 16.03.2026, 11:29

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A head-to-head comparison of GPT-based prognostic predictions and oncologists in gastrointestinal cancers.

2026·0 Zitationen·Journal of Clinical Oncology
Volltext beim Verlag öffnen

0

Zitationen

4

Autoren

2026

Jahr

Abstract

808 Background: Patients and families want to know their likely survival times but oncologists have trouble making estimates and communicating this information. Large language models (LLMs) such as ChatGPT may assist with estimating prognosis. Methods: We conducted a retrospective pilot study using clinical data from 22 patients with advanced gastrointestinal malignancies and known survival time. A single progress note from near the time of diagnosis was de-identified and given to a gastrointestinal oncologist and a HIPAA-compliant instance of ChatGPT. Likelihood of survival at 6 months, 1 year, 2 years, and 5 years were estimated and categorized as likely (>75%), possible (25-75%), and unlikely (<25%). Predictions were analyzed in two ways. Primary analysis: predictions were scored as correct/incorrect relative to observed survival, and paired accuracy was compared with McNemar’s exact test at each timepoint. Overall patient-level accuracy was compared with an exact binomial sign test. Exploratory analysis: predictions were categorized into probability bins (<25%, 25–75%, >75%). Calibration was assessed by observed survival within each bin. Results: Among 22 patients, median age was 59 years (range 42–80); 55% were male and 45% female, and 45% were Hispanic. Cancer types were heterogeneous, most commonly hepatocellular carcinoma (23%), gastric (18%) or colorectal adenocarcinoma (18%), and gastrointestinal stromal tumor (9%); 27% were stage IV. In the primary analysis (alive - yes or no at timepoint), ChatGPT and the oncologist achieved similar accuracy: 6 months (16/22 vs 15/22, p=1.0), 1 year (15/22 vs 20/22, p=0.125), 2 years (19/22 vs 19/22, p=1.0), and 5 years (20/22 vs 19/22, p=1.0). Overall, the oncologist outperformed ChatGPT in 8 patients, ChatGPT outperformed in 5, and 9 were tied (p=0.58). In the exploratory bin analysis, both showed identical confident performance (accuracy 91% at 6 months and 1 year, 90% at 2 years, 100% at 5 years). Calibration differed: ChatGPT’s >75% bins consistently corresponded to high observed survival (87–100%), reflecting a tendency toward more optimistic survival estimates, whereas oncologists rarely used high-survival bins and often underestimated survival. Conclusions: Oncologists were numerically more accurate overall, with a signal toward greater accuracy at 1 year, though differences were not statistically significant. ChatGPT achieved comparable performance and demonstrated superior calibration of probability estimates, supporting its potential as a complementary prognostic tool in gastrointestinal oncology.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationRadiomics and Machine Learning in Medical ImagingRadiology practices and education
Volltext beim Verlag öffnen