Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Diagnostic performance of Prof. Valmed, ChatGPT-5 Thinking, and OpenEvidence in rheumatology: A comparative evaluation
2
Zitationen
7
Autoren
2026
Jahr
Abstract
To compare the diagnostic performance of a subscription-based medical large language model (LLM) certified as a medical device (Prof. Valmed), a subscription-based general-purpose LLM (ChatGPT-5 Thinking), and a freely accessible medical LLM (OpenEvidence). Sixty vignettes covering rare rheumatic diseases and differential diagnoses were entered using a standardized prompt to generate five top diagnoses and respective diagnostic probabilities. Blinded rheumatologists categorized suggested diagnoses as identical, plausible, or diagnostically different. Diagnostic accuracy was assessed using proportions of identical and plausible diagnoses and a total diagnostic score. Analyses were descriptive. Group differences for the proportion of identical top diagnoses were explored using Cochran’s Q test and post-hoc McNemar tests. Processing time was recorded. OpenEvidence produced the highest proportion of identical top diagnoses (35.0%), followed by ChatGPT-5 Thinking (26.7%) and Prof. Valmed (23.3%); however, pairwise differences were not statistically significant. ChatGPT-5 Thinking achieved the highest total diagnostic score (226), followed by OpenEvidence (221) and Prof. Valmed (212). All systems showed markedly higher diagnostic probabilities for identical versus different diagnoses. Mean processing times ranged from 20 to 36 s. All three systems demonstrated broadly comparable diagnostic accuracy and processing times. Further benchmarking that incorporates additional evaluation dimensions is essential to inform safe and effective use of LLM-based clinical support.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.460 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.341 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.791 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.536 Zit.