OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 23.04.2026, 04:38

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Clinical Performance Tradeoffs of ChatGPT-5.2 Thinking (OpenAI) Compared with Radiologist Interpretation in Biopsy-Referred Mammography: Cancer Detection, False Positives, and Laterality

2026·0 Zitationen·TomographyOpen Access
Volltext beim Verlag öffnen

0

Zitationen

8

Autoren

2026

Jahr

Abstract

Background/Objectives: Breast cancer screening such as mammography supports earlier detection, but variability in interpretation can still lead to missed cancers and avoidable follow-up testing. We evaluated ChatGPT-5.2 Thinking (OpenAI) as a stand-alone model for examination-level malignancy classification on standard bilateral mammography views in a biopsy-referred cohort, compared with breast radiologists, and assessed laterality performance. Methods: We conducted a retrospective, multicenter diagnostic-accuracy study across breast imaging centers in Saudi Arabia. From an upstream screened cohort (n = 1225), we constructed a biopsy-referred test set of 100 mammography examinations (four 2D views per exam: bilateral CC and MLO; 400 images), including 61 biopsy-confirmed malignancies and 39 biopsy-negative controls, with pathology as the reference standard. Radiologists were blinded to pathology and AI outputs and assigned BI-RADS (0–5) and suspected laterality. ChatGPT-5.2 interpreted the same de-identified views using a BI-RADS-guided prompt to generate BI-RADS and laterality. The sensitivity, specificity, accuracy, and laterality concordance were then estimated. Results: ChatGPT-5.2 had higher sensitivity than radiologists (95.08% vs. 81.97%) but markedly lower specificity (10.26% vs. 56.41%), resulting in lower overall accuracy (62.00% vs. 72.00%). The AI produced 58 true positives, 35 false positives, and 3 false negatives, while radiologists produced 50 true positives, 17 false positives, and 11 false negatives. Laterality accuracy among malignant examinations was 60.66%. Conclusions: In this pathology-anchored, biopsy-referred evaluation, ChatGPT-5.2 identified more cancers but generated substantially more false-positive classifications and showed only moderate breast-side localization. These findings support use as a concurrent aid or prioritization tool rather than a stand-alone reader and motivate efforts to improve specificity and laterality before prospective validation.

Ähnliche Arbeiten