Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Can OpenAI o1 Reason Well in Ophthalmology? A 6,990-Question Head-to-Head Evaluation Study
2
Zitationen
17
Autoren
2025
Jahr
Abstract
Question: What is the performance and reasoning ability of OpenAI o1 compared to other large language models in addressing ophthalmology-specific questions? Findings: This study evaluated OpenAI o1 and five LLMs using 6,990 ophthalmological questions from MedMCQA. O1 achieved the highest accuracy (0.88) and macro-F1 score but ranked third in reasoning capabilities based on text-generation metrics. Across subtopics, o1 ranked first in ``Lens'' and ``Glaucoma'' but second to GPT-4o in ``Corneal and External Diseases'', ``Vitreous and Retina'' and ``Oculoplastic and Orbital Diseases''. Subgroup analyses showed o1 performed better on queries with longer ground truth explanations. Meaning: O1's reasoning enhancements may not fully extend to ophthalmology, underscoring the need for domain-specific refinements to optimize performance in specialized fields like ophthalmology.
Ähnliche Arbeiten
The SCARE 2018 statement: Updating consensus Surgical CAse REport (SCARE) guidelines
2018 · 2.335 Zit.
The Levels of Evidence and Their Role in Evidence-Based Medicine
2011 · 2.118 Zit.
CARE guidelines for case reports: explanation and elaboration document
2017 · 2.032 Zit.
The SCARE Statement: Consensus-based surgical case report guidelines
2016 · 1.681 Zit.
Dermoscopy of pigmented skin lesions: Results of a consensus meeting via the Internet
2003 · 1.201 Zit.