Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating AI Competence in Specialized Medicine: A Comparative Analysis of ChatGPT and Neurologists in a Neurology Specialist exam in Spain (Preprint)
0
Zitationen
2
Autoren
2024
Jahr
Abstract
<sec> <title>BACKGROUND</title> With the rapid advancement of artificial intelligence (AI) in various fields, evaluating its application in specialized medical contexts becomes crucial. ChatGPT, a large language model developed by OpenAI, has shown potential in diverse applications, including medicine. </sec> <sec> <title>OBJECTIVE</title> This study aims to compare the performance of ChatGPT with that of attending neurologists in a real neurology specialist examination conducted in the Valencian Community, Spain, to assess the AI's capabilities and limitations in medical knowledge. </sec> <sec> <title>METHODS</title> We conducted a comparative analysis using the 2022 neurology specialist exam results from 120 neurologists and responses generated by ChatGPT versions 3.5 and 4. The exam consisted of 80 multiple-choice questions, with a focus on clinical neurology and health legislation. Questions were classified according to Bloom's Taxonomy. Statistical analysis of performance, including Kappa coefficient for response consistency, was performed. </sec> <sec> <title>RESULTS</title> Human participants exhibited a median score of 5.91, with 32 neurologists failing to pass. ChatGPT-3.5 ranked 116th out of 122, answering 54.5% of questions correctly (score 3.94). ChatGPT-4 showed marked improvement, ranking 17th with 81.8% of correct answers (score 7.57), surpassing several human specialists. No significant variations were observed in the performance on lower-order versus higher-order questions. Additionally, ChatGPT-4 demonstrated increased inter-rater reliability, as reflected by a higher Kappa coefficient of 0.73, compared to ChatGPT-3.5's coefficient of 0.69. </sec> <sec> <title>CONCLUSIONS</title> This study underscores the evolving capabilities of AI in medical knowledge assessment, particularly in specialized fields. ChatGPT4's performance, surpassing the median human score in a rigorous neurology exam, marks a notable advancement, suggesting its potential as an effective tool in specialized medical education and assessment. </sec>
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.245 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.102 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.468 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.429 Zit.