Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparative Evaluation of Advanced AI Reasoning Models in Pediatric Clinical Decision Support: ChatGPT O1 vs. DeepSeek-R1

2025·48 Zitationen·medRxivOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Abstract Introduction The adoption of advanced reasoning models, such as ChatGPT O1 and DeepSeek-R1, represents a pivotal step forward in clinical decision support, particularly in pediatrics. ChatGPT O1 employs “chain-of-thought reasoning” (CoT) to enhance structured problem-solving, while DeepSeek-R1 introduces self-reflection capabilities through reinforcement learning. This study aimed to evaluate the diagnostic accuracy and clinical utility of these models in pediatric scenarios using the MedQA dataset. Materials and Methods A total of 500 multiple-choice pediatric questions from the MedQA dataset were presented to ChatGPT O1 and DeepSeek-R1. Each question included four or more options, with one correct answer. The models were evaluated under uniform conditions, with performance metrics including accuracy, Cohen’s Kappa, and chi-square tests applied to assess agreement and statistical significance. Responses were analyzed to determine the models effectiveness in addressing clinical questions. Results ChatGPT O1 achieved a diagnostic accuracy of 92.8%, significantly outperforming DeepSeek-R1, which scored 87.0% ( p < 0.00001). The CoT reasoning technique used by ChatGPT O1 allowed for more structured and reliable responses, reducing the risk of errors. Conversely, DeepSeek-R1, while slightly less accurate, demonstrated superior accessibility and adaptability due to its open-source nature and emerging self-reflection capabilities. Cohen’s Kappa (K=0.20) indicated low agreement between the models, reflecting their distinct reasoning strategies. Conclusions This study highlights the strengths of ChatGPT O1 in providing accurate and coherent clinical reasoning, making it highly suitable for critical pediatric scenarios. DeepSeek-R1, with its flexibility and accessibility, remains a valuable tool in resource-limited settings. Combining these models in an ensemble system could leverage their complementary strengths, optimizing decision support in diverse clinical contexts. Further research is warranted to explore their integration into multidisciplinary care teams and their application in real-world clinical settings.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic SkillsMachine Learning in Healthcare

Volltext beim Verlag öffnen

Comparative Evaluation of Advanced AI Reasoning Models in Pediatric Clinical Decision Support: ChatGPT O1 vs. DeepSeek-R1

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen