OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 21.03.2026, 01:32

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Testing the power of Google DeepMind: Gemini versus ChatGPT 4 facing a European ophthalmology examination

2024·11 Zitationen·AJO InternationalOpen Access
Volltext beim Verlag öffnen

11

Zitationen

4

Autoren

2024

Jahr

Abstract

The aim of this study was to compare the performances of Google Gemini and ChatGPT-4, facing a triple simulation of the European Board of Ophthalmologists (EBO) multiple choices exam. Observational study. The EBO multiple choice examination consists of 52 questions followed by 5 statements each, for a total of 260 answers. Statements may be answered with “True”, “False” or “Don't Know”: a correct answer is awarded 1 point; an incorrect is penalized 0.5 points; “don't know” scores 0 points. At least 60 % correct answers are needed to pass the exam. After explaining the rules to the chatbots, he entire question with the 5 statements was input. The rate of correct answers and the final score were collected. The exam simulation was repeated 3 times with randomly generated questions. Google Gemini and ChatGPT-4 succeed in EBO exam simulations in all 3 cases, with an average 85.3 ± 3.1 % and 83.3 ± 2.4 % of correct answers. Gemini had a lower error rate compared to ChatGPT (6.7 ± 1.5 % vs. 13.0 ± 2.6 %, p = 0.03), but answered “Don't know” more frequently (8.0 ± 2.7 % vs. 3.7 ± 1.5 %, p = 0.05). Both chatbots scored at least 70 % of correct answers in each exam subspecialty across the 3 simulations. Converting the percentages into points, Gemini scored 213.5 ± 9.3 points on average, compared to 199.8 ± 7.1 points for ChatGPT (p = 0.21). Google Gemini and ChatGPT-4 can both succeed in a complex ophthalmology examination on widespread topics, with higher accuracy compared to their former versions, highlighting their evolving importance in educational and informative setting. Google Gemini and ChatGPT-4 were both able to succeed in 3 consecutive exam simulations of the European Board of Ophthalmologists with an average of 85 % and 83 % correct answers, respectively. Google Gemini showed significantly less errors when compared to ChatGPT.

Ähnliche Arbeiten