Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Artificial Intelligence (AI) in rheumatology: a comparative evaluation of the ChatGPT and DeepSeek application

2026·0 Zitationen·BMC RheumatologyOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

The continuous increase in Artificial Intelligence (AI) applications in various areas of human life has brought about great changes in many sciences, among which is the health sector. ChatGPT and DeepSeek belong to the category of Large Language Models (LLMs) developed by Artificial Intelligence (AI) using supervised and reinforcement learning techniques. The aim of this article is to evaluate the accuracy and consistency of ChatGPT and DeepSeek models in the diagnosis and treatment of two rheumatologic diseases, ankylosing spondylitis (axSpA) and psoriatic arthritis (PsA). Both ChatGPT and the DeepSeek chat system have revolutionized information retrieval capabilities and are two of the fastest growing platforms. They are effective tools that produce text responses to human data with high accuracy, accessibility, and low cost, but their use has raised many questions about their reliability. The evaluation carried out in this article is done by comparing the responses obtained from the two models with the results of clinical findings in axSpA and PsA, using four statistical tests. Specifically, the comparison of the responses with the clinical data obtained from 116 patients, who were hospitalized for rheumatological diseases at the Rheumazentrum Ruhrgebiet in Herne, was carried out by calculating the differences in the mean values of the estimates, the Cohen Kappa coefficient, the Fleiss’ Kappa coefficient and the confidence level corresponding to the differences in the mean values, as well as from the calculation of certain other statistical indicators. Regarding the comparison of the mean values, there are results in which their coincidence for the three cases examined is very good, in other cases it is satisfactory, while in the rest there are large differences. Regarding the results of the calculation of the Cohen’s Kappa coefficient, no agreement is indicated between the clinical results and the answers of ChatGPT and DeepSeek and specifically the GPT-5 and DeepSeek-R1 models. The results of the calculations of the Fleiss’ Kappa coefficient showed that also, no satisfactory agreement was found in the values of the clinical data, with the answers of ChatGPT and DeepSeek. The results obtained from the calculation of certain other statistical indicators, as well as the probabilities corresponding to the differences between the mean values of the results obtained from the two models and the clinical findings, are similar. The final results and quantitative assessments of the analysis showed that the responses of the ChatGPT and DeepSeek models have moderate validity, reliability and utility in providing information to patients with axSpA and PsA. Therefore, the use of the information obtained from these models should be done after relevant evaluation and validation by doctors and cross-checking the recommendations with updated clinical guidelines. Not applicable.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationSpondyloarthritis Studies and TreatmentsClinical Reasoning and Diagnostic Skills

Volltext beim Verlag öffnen

Artificial Intelligence (AI) in rheumatology: a comparative evaluation of the ChatGPT and DeepSeek application

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen