Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Clinical Performance Tradeoffs of ChatGPT-5.2 Thinking (OpenAI) Compared with Radiologist Interpretation in Biopsy-Referred Mammography: Cancer Detection, False Positives, and Laterality

2026·0 Zitationen·TomographyOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Background/Objectives: Breast cancer screening such as mammography supports earlier detection, but variability in interpretation can still lead to missed cancers and avoidable follow-up testing. We evaluated ChatGPT-5.2 Thinking (OpenAI) as a stand-alone model for examination-level malignancy classification on standard bilateral mammography views in a biopsy-referred cohort, compared with breast radiologists, and assessed laterality performance. Methods: We conducted a retrospective, multicenter diagnostic-accuracy study across breast imaging centers in Saudi Arabia. From an upstream screened cohort (n = 1225), we constructed a biopsy-referred test set of 100 mammography examinations (four 2D views per exam: bilateral CC and MLO; 400 images), including 61 biopsy-confirmed malignancies and 39 biopsy-negative controls, with pathology as the reference standard. Radiologists were blinded to pathology and AI outputs and assigned BI-RADS (0–5) and suspected laterality. ChatGPT-5.2 interpreted the same de-identified views using a BI-RADS-guided prompt to generate BI-RADS and laterality. The sensitivity, specificity, accuracy, and laterality concordance were then estimated. Results: ChatGPT-5.2 had higher sensitivity than radiologists (95.08% vs. 81.97%) but markedly lower specificity (10.26% vs. 56.41%), resulting in lower overall accuracy (62.00% vs. 72.00%). The AI produced 58 true positives, 35 false positives, and 3 false negatives, while radiologists produced 50 true positives, 17 false positives, and 11 false negatives. Laterality accuracy among malignant examinations was 60.66%. Conclusions: In this pathology-anchored, biopsy-referred evaluation, ChatGPT-5.2 identified more cancers but generated substantially more false-positive classifications and showed only moderate breast-side localization. These findings support use as a concurrent aid or prioritization tool rather than a stand-alone reader and motivate efforts to improve specificity and laterality before prospective validation.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationRadiomics and Machine Learning in Medical ImagingAI in cancer detection

Volltext beim Verlag öffnen

Clinical Performance Tradeoffs of ChatGPT-5.2 Thinking (OpenAI) Compared with Radiologist Interpretation in Biopsy-Referred Mammography: Cancer Detection, False Positives, and Laterality

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen