Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
DeepSeek R1 excels in diagnosing previously misdiagnosed cases
1
Zitationen
15
Autoren
2025
Jahr
Abstract
Misdiagnosis remains a critical challenge in clinical practice, particularly in complex cases. While artificial intelligence (AI) has shown promise in medical diagnostics, its capability to rectify previously misdiagnosed cases has not been thoroughly examined. This study evaluates the diagnostic performance of five AI models—DeepSeek R1, ChatGPT-4o, Claude 3.5 Sonnet, Gemini 2.0 Flash, and Meta Llama 3.3—using misdiagnosed cases from the China Clinical Case Database. Each model generated ten ranked differential diagnoses per case, and accuracy was scored from 10 (correct diagnosis ranked first) to 0 (not in the top 10). The models’ diagnostic performance was compared across disease categories, and inter-model agreement was assessed using Cohen’s Kappa. Among 227 analyzed cases, DeepSeek R1 achieved the highest diagnostic accuracy (65.6%), followed by Claude 3.5 Sonnet (61.2%), Gemini 2.0 Flash (59.0%), GPT-4o (39.6%), and Meta Llama 3.3 (36.1%). DeepSeek R1 also showed the strongest agreement with Gemini 2.0 Flash (κ = 0.561, 95% CI: 0.449–0.673). These results indicate that DeepSeek R1 excels in identifying and correcting misdiagnosed cases and highlight the potential of AI models to improve diagnostic accuracy in complex clinical scenarios, emphasizing the importance of model selection in AI-assisted diagnosis. • Comparative study of five AI models in correcting misdiagnosed cases. • DeepSeek R1 achieved the highest accuracy (65.6%) and ranking performance. • DeepSeek R1 and Gemini 2.0 Flash showed the highest agreement (κ = 0.561). • AI models varied in effectiveness, highlighting the importance of model selection. • Findings support AI’s role in reducing diagnostic errors in complex cases.
Ähnliche Arbeiten
The Strengths and Difficulties Questionnaire: A Research Note
1997 · 14.535 Zit.
Making sense of Cronbach's alpha
2011 · 13.678 Zit.
QUADAS-2: A Revised Tool for the Quality Assessment of Diagnostic Accuracy Studies
2011 · 13.543 Zit.
A method for estimating the probability of adverse drug reactions
1981 · 11.452 Zit.
Evidence-Based Medicine
1992 · 4.134 Zit.