Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
From GPT-3.5 to GPT-5.2: a paired longitudinal evaluation of large language models in clinical neurology
0
Zitationen
4
Autoren
2026
Jahr
Abstract
INTRODUCTION: Large language models are increasingly used in evaluating medical data and clinical decision making. Data on the performance evolution of these models are limited. This study evaluated the intergenerational development of model performance using paired methods in the discipline of neurology, where the need to synthesize and contextualize complex information is high. METHODS: The scoring system and clinical neurology question set comprising 216 questions used in our previous study were employed using methodological replication. The questions, evenly distributed across 12 subspecialties, were divided into subgroups based on question type, difficulty level, and qualitative characteristics. The responses underwent accuracy and comprehensiveness analyses by three independent academics. Effect sizes were calculated using matched analyses between the two generations. RESULTS: < 0.001; r:0.78) were significantly higher than the previous version, with effect sizes observed at the medium to high levels. Consistent performance improvement was observed across question types, difficulty levels, and qualitative characteristics. Performance was relatively low in some subspecialties. CONCLUSION: The GPT-5.2 model demonstrated a significant performance increase compared with the previous model when presented with questions in clinical neurology. The performance increase was supported by high effect sizes, indicating potential clinical relevance. Model evolution was not homogeneous across subspecialties. Integrating it into clinical systems with strict control mechanisms may alleviate safety concerns.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.774 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.685 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.244 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.898 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.