Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Diagnostic performance of Prof. Valmed, ChatGPT-5 Thinking, and OpenEvidence in rheumatology: A comparative evaluation

2026·2 Zitationen·Rheumatology InternationalOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

To compare the diagnostic performance of a subscription-based medical large language model (LLM) certified as a medical device (Prof. Valmed), a subscription-based general-purpose LLM (ChatGPT-5 Thinking), and a freely accessible medical LLM (OpenEvidence). Sixty vignettes covering rare rheumatic diseases and differential diagnoses were entered using a standardized prompt to generate five top diagnoses and respective diagnostic probabilities. Blinded rheumatologists categorized suggested diagnoses as identical, plausible, or diagnostically different. Diagnostic accuracy was assessed using proportions of identical and plausible diagnoses and a total diagnostic score. Analyses were descriptive. Group differences for the proportion of identical top diagnoses were explored using Cochran’s Q test and post-hoc McNemar tests. Processing time was recorded. OpenEvidence produced the highest proportion of identical top diagnoses (35.0%), followed by ChatGPT-5 Thinking (26.7%) and Prof. Valmed (23.3%); however, pairwise differences were not statistically significant. ChatGPT-5 Thinking achieved the highest total diagnostic score (226), followed by OpenEvidence (221) and Prof. Valmed (212). All systems showed markedly higher diagnostic probabilities for identical versus different diagnoses. Mean processing times ranged from 20 to 36 s. All three systems demonstrated broadly comparable diagnostic accuracy and processing times. Further benchmarking that incorporates additional evaluation dimensions is essential to inform safe and effective use of LLM-based clinical support.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic SkillsRheumatoid Arthritis Research and Therapies

Volltext beim Verlag öffnen

Diagnostic performance of Prof. Valmed, ChatGPT-5 Thinking, and OpenEvidence in rheumatology: A comparative evaluation

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen