Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Diagnostic accuracy and bias in open access and subscription-based large language models for multiple sclerosis and neuromyelitis optica spectrum disorder

2025·0 Zitationen·Intelligence-Based MedicineOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Overlapping clinical symptoms between people with multiple sclerosis (PwMS) and those with neuromyelitis optica spectrum disorder (PwNMOSD) can result in misdiagnosis. Large language models, such as ChatGPT, offer accessible tools for preliminary health guidance. We assessed the accuracy of open-access (GPT-3.5) and subscription-based (GPT-4) models in diagnosing MS and NMOSD, and the influences of key diagnostic inflection points (initial MRI findings and aquaporin-4 (AQP4) antibody testing) and subject demographics on model performance. PwMS and PwNMOSD were retrospectively identified within a single academic center, and structured clinical timelines were processed through GPT-3.5 and GPT-4. Seven digital derivatives per subject, varying race, ethnicity, and sex, were also created to assess demographic influences. ChatGPT provided one diagnosis after each timepoint, and diagnostic accuracy was determined using mixed-effects logistic regression. A total of 98 PwMS and 157 PwNMOSD were included, generating 4,080 ChatGPT conversations across models and digital derivatives. GPT-4 demonstrated higher diagnostic accuracy for MS (OR=2.67) and NMOSD (OR=1.31), relative to GPT-3.5. Accuracy improved as the clinical time line progressed, although GPT-4 paradoxically performed worse after the initial MRI report for MS cases (OR=0.56). For PwMS, diagnostic accuracy was lower in males (OR=0.81) and older individuals (OR=0.56 per 10-year age increase). Conversely, accuracy was higher for African Americans (OR=1.30) and Asians (OR=1.38) for PwNMOSD. ChatGPT-4 demonstrated higher diagnostic accuracy for both diseases, but superior performance was not uniform across demographic groups. Further, the paradoxical decline in accuracy after MRI interpretation in MS cases suggests context-dependent performance, and responsible interpretation remains necessary. • ChatGPT-4 (paid model) outperformed ChatGPT-3.5 (open access) in diagnosing MS and NMOSD • Accuracy improved as additional events on the clinical time line were presented • ChatGPT-4 paradoxically underperformed after MRI reports were presented in MS cases • Males and certain racial groups showed lower diagnostic accuracy across both models • Inconsistent output and demographic biases underscore limitations of ChatGPT

Autoren

Institutionen

Themen

Multiple Sclerosis Research StudiesArtificial Intelligence in Healthcare and EducationPeripheral Neuropathies and Disorders

Volltext beim Verlag öffnen

Diagnostic accuracy and bias in open access and subscription-based large language models for multiple sclerosis and neuromyelitis optica spectrum disorder

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen