OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 12.05.2026, 03:27

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Assessing AI efficacy in medical knowledge tests: A study using Taiwan's internal medicine exam questions from 2020 to 2023

2024·9 Zitationen·Digital HealthOpen Access
Volltext beim Verlag öffnen

9

Zitationen

6

Autoren

2024

Jahr

Abstract

Background: The aim of this study is to evaluate the ability of generative artificial intelligence (AI) models to handle specialized medical knowledge and problem-solving in a formal examination context. Methods: This research utilized internal medicine exam questions provided by the Taiwan Internal Medicine Society from 2020 to 2023, testing three AI models: GPT-4o, Claude_3.5 Sonnet, and Gemini Advanced models. Rejected queries for Gemini Advanced were translated into French for resubmission. Performance was assessed using IBM SPSS Statistics 26, with accuracy percentages calculated and statistical analyses such as Pearson correlation and analysis of variance (ANOVA) performed to gauge AI efficacy. Results: GPT-4o's top annual score was 86.25 in 2022, with an average of 81.97. Claude_3.5 Sonnet reached a peak score of 88.13 in 2021 and 2022, averaging 84.85, while Gemini Advanced lagged with an average score of 69.84. In specific specialties, Claude_3.5 Sonnet scored highest in Psychiatry (100%) and Nephrology (97.26%), with GPT-4o performing similarly well in Hematology & oncology (97.10%) and Nephrology (94.52%). Gemini's best scores were in Psychiatry (86.96%) and Hematology & Oncology (82.76%). Gemini Advanced models struggled with Neurology, scoring below 60%. Additionally, all models performed better on text-based questions than on image-based ones, without significant differences. Claude 3 Opus scored highest on COVID-19-related questions at 89.29%, followed by GPT-4o at 75.00% and Gemini Advanced at 67.86%. Conclusions: AI models showed varied proficiency across medical specialties and question types. GPT-4o demonstrated higher image-based correction rates. Claude_3.5 Sonnet generally and consistently outperformed others, highlighting significant potential for AI in assisting medical education.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic SkillsRadiomics and Machine Learning in Medical Imaging
Volltext beim Verlag öffnen