Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating AI Reasoning Models in Pediatric Medicine: A Comparative Analysis of o3-mini and o3-mini-high
4
Zitationen
5
Autoren
2025
Jahr
Abstract
Abstract Artificial intelligence (AI) is increasingly playing a crucial role in modern medicine, particularly in clinical decision support. This study compares the performance of two OpenAI reasoning models, o3-mini and o3-mini-high, in answering 900 pediatric clinical questions derived from the MedQA-USMLE dataset. The evaluation focuses on accuracy, response time, and consistency to determine their effectiveness in pediatric diagnostic and therapeutic decision-making. The results indicate that o3-mini-high achieves a higher accuracy (90.55% vs. 88.3%) and faster response times (64.63 seconds vs. 71.63 seconds) compared to o3-mini. The chi-square test confirmed that these differences are statistically significant (X 2 = 328.9675, p < 0.00001)). Error analysis revealed that o3-mini-high corrected more errors from o3-mini than vice versa, but both models shared 61 common errors, suggesting intrinsic limitations in training data or model architecture. Additionally, accessibility differences between the models were considered. While DeepSeek-R1, evaluated in a previous study, offers unrestricted free access, OpenAI’s o3 models have message limitations, potentially influencing their suitability in resource-constrained environments. Future improvements should aim at reducing shared errors, optimizing o3-mini’s accuracy while maintaining efficiency, and refining o3-mini-high for enhanced performance. Implementing an ensemble approach that leverages both models’ strengths could provide a more robust AI-driven clinical decision support system, particularly in time-sensitive pediatric scenarios such as emergency care and neonatal intensive care units.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.292 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.143 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.539 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.452 Zit.