Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparative Evaluation of Multiplatform AI Performance on Practical Ophthalmology Exam Questions: Insights from the Brazilian Council of Ophthalmology Exam
0
Zitationen
13
Autoren
2025
Jahr
Abstract
In recent years, advances in artificial intelligence (AI), especially with the emergence of natural language models and deep neural networks, have revolutionised medical practice, offering tools with the potential to assist both in diagnosis and specialised medical training. The main objective of this study was to evaluate the accuracy and agreement of different artificial intelligence (AI) models in solving practical questions from the Brazilian Council of Ophthalmology (CBO) Exam. To this end, the performances of 5 AI models (ChatGPT, Gemini, DeepSeek, Google AI Studio, and GROK) were analyzed in a set of 560 questions, distributed in eight thematic blocks of ophthalmology (Cornea, Cataract, Retina, Glaucoma, Neuro-ophthalmology, Optics and Refraction, Strabismus, and Plastic Surgery/Lacrimal Duct/Orbit). The answers were compared to the official answer key by calculating the percentage of correct answers and the Cohen's Kappa and Fleiss's coefficients of agreement. Cohen's Kappa coefficient was used to measure the agreement between the AI responses and the official template, as well as Fleiss's Kappa to measure the overall agreement between the different AIs. The most evident finding was that the Gemini model presented the highest accuracy rate (77.6%) and the highest overall agreement with the official answer key. Significant variation in performance between blocks was also observed, with greater accuracy in the Retina and Glaucoma themes, and lower accuracy in the Strabismus and Plastic Surgery blocks. The thematic analysis allowed us to identify the pattern of correct answers by speciality, revealing weaknesses of the models in areas with greater dependence on visual assessment and clinical subjectivity. In addition to a probable educational applicability of AIs, it proved to be viable as a complementary tool in medical training, especially when used under supervision and with defined pedagogical objectives. Therefore, it was concluded that, despite the limitations, the most up-to-date models trained based on specific clinical data were able to faithfully reproduce diagnostic reasoning in several areas of ophthalmology, evidencing their potential for integration into specialised education, as long as they are used with technical and ethical criteria. These findings suggest AI can serve as a supplementary tool in ophthalmic education, with caution in subjective specialities.
Ähnliche Arbeiten
Global data on visual impairment in the year 2002.
2004 · 4.205 Zit.
Two cortical visual systems
1982 · 4.150 Zit.
Neuropsychological assessment, 3rd ed.
1995 · 2.827 Zit.
Neuropsychological assessment, 4th ed.
2004 · 1.942 Zit.
Causes of vision loss worldwide, 1990–2010: a systematic analysis
2013 · 1.848 Zit.
Autoren
- Déborah Silva Nunes
- Joacy Pedro Franco David
- José Jesu Sisnando D’Araújo Filho
- Kelly Cristina Costa Guedes Nascimento
- Igor Jordan Barbosa Coutinho
- Rebeca Andrade Ferraz
- Maria Isabel Muniz Zemero
- Syenne Pimentel Fayal
- Ana Caroline Coelho dos Passos
- Luis Eduardo de Carvalho Barros
- Rodrigo Rodrigues Virgolino
- George de Almeida Marques
- Vitor Hugo Auzier Lima