OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 07.04.2026, 05:22

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluation of the effectiveness of the ChatGPT artificial intelligence application in the diagnosis of spontaneous pneumothorax on chest radiograph interpretation

2025·0 Zitationen·BMC Pulmonary MedicineOpen Access
Volltext beim Verlag öffnen

0

Zitationen

7

Autoren

2025

Jahr

Abstract

Spontaneous pneumothorax is a potentially life-threatening condition commonly diagnosed using chest radiographs. However, interpreting chest X-rays can be challenging due to anatomical overlap and observer variability. This study aimed to evaluate the diagnostic accuracy of ChatGPT, a large language model (LLM), in detecting pneumothorax on chest radiographs compared to expert thoracic surgeons. In this retrospective study, 220 chest radiographs were assessed. Expert consensus classified 110 cases with pneumothorax and 110 without. The images were uploaded to the GPT-4o model without any clinical information, and ChatGPT was asked to identify the presence or absence of pneumothorax. Diagnostic performance was evaluated by calculating sensitivity, specificity, accuracy, positive and negative predictive values, and area under the receiver operating characteristic curve (AUC). Subgroup analyses were performed based on pneumothorax size. ChatGPT demonstrated an overall diagnostic accuracy of 83.7%, sensitivity of 70.9%, specificity of 96.4%, positive predictive value of 95.1%, and negative predictive value of 76.8%. The AUC was 0.836 (95% CI: 0.780-0.893). Diagnostic performance was higher for large pneumothoraces (AUC: 0.894) compared to small pneumothoraces (AUC: 0.439). Cohen’s kappa coefficient indicated substantial agreement (κ=0.673; 95%CI: 0.575-0.771) with expert evaluations. ChatGPT demonstrates potential in detecting pneumothorax on chest radiographs, particularly in cases of large pneumothorax. However, its limited sensitivity for small pneumothoraces raises significant concerns about its reliability in clinical decision-making. Any use of ChatGPT in diagnostic workflows should be approached with caution, as unverified outputs may lead to inappropriate interventions or under-triaging. Therefore, the model is not suitable as a standalone diagnostic or triage tool. Its potential utility may lie in exploratory or supervised settings where expert oversight is available, but further validation is required before clinical implementation can be considered.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationUltrasound in Clinical ApplicationsCOVID-19 diagnosis using AI
Volltext beim Verlag öffnen