Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
DeepSeek Outperforms GPT-4o in Multispecialty Ophthalmic Diagnosis: A Blinded Expert Evaluation of 33 Complex Cases
0
Zitationen
5
Autoren
2025
Jahr
Abstract
<title>Abstract</title> <bold>Purpose:</bold> To compare the diagnostic and treatment performance of DeepSeek (DS) and GPT-4o large language models (LLMs) in ophthalmology using standardized residency examination cases. <bold>Design:</bold> Cross-sectional comparative study. <bold>Participants:</bold> Thirty-three representative cases drawn from the Chinese Ophthalmology Residency Examination Database, covering 8 subspecialties. <bold>Methods:</bold> Each case was processed by DS and GPT-4o with identical prompts to act as senior ophthalmologists.Three independent ophthalmologists conducted double-blind evaluations of each model’s outputs. Accuracy was scored on a 10-point Likert scale and completeness on a 6-point Likert scale for diagnosis, differential diagnosis, and treatment. Mean scores were compared using paired statistical tests and two-way ANOVA. <bold>Main Outcome Measures:</bold> Accuracy and completeness scores across diagnostic, differential diagnostic, and treatment tasks. <bold>Results:</bold> Across all cases, DS achieved significantly higher accuracy for diagnosis (8.04 vs 6.46, <italic>P</italic> < 0.0001), differential diagnosis (7.52 vs 5.50, <italic>P</italic> < 0.0001), and treatment (7.62 vs 6.65, <italic>P</italic> = 0.002) compared with GPT-4o. Completeness scores were also superior for DS in diagnosis (4.86 vs 3.69, <italic>P</italic> < 0.0001), differential diagnosis (4.44 vs 3.24, <italic>P</italic> < 0.0001), and treatment (4.61 vs 3.90, <italic>P</italic> = 0.0001). Subspecialty analyses revealed the largest advantage for DS in retinal diseases, glaucoma, strabismus & amblyopia, and optic nerve disorders. <bold>Conclusions:</bold> In standardized ophthalmology case evaluations, DS outperformed GPT-4o in both accuracy and completeness, particularly in subspecialties requiring complex reasoning. These findings support the potential role of domain-optimized LLMs as adjuncts in ophthalmic education and clinical decision support, with further research warranted in multimodal and real-world clinical settings.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.214 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.071 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.429 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.418 Zit.