OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 13.03.2026, 20:12

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

DeepSeek Outperforms GPT-4o in Multispecialty Ophthalmic Diagnosis: A Blinded Expert Evaluation of 33 Complex Cases

2025·0 ZitationenOpen Access
Volltext beim Verlag öffnen

0

Zitationen

5

Autoren

2025

Jahr

Abstract

<title>Abstract</title> <bold>Purpose:</bold> To compare the diagnostic and treatment performance of DeepSeek (DS) and GPT-4o large language models (LLMs) in ophthalmology using standardized residency examination cases. <bold>Design:</bold> Cross-sectional comparative study. <bold>Participants:</bold> Thirty-three representative cases drawn from the Chinese Ophthalmology Residency Examination Database, covering 8 subspecialties. <bold>Methods:</bold> Each case was processed by DS and GPT-4o with identical prompts to act as senior ophthalmologists.Three independent ophthalmologists conducted double-blind evaluations of each model’s outputs. Accuracy was scored on a 10-point Likert scale and completeness on a 6-point Likert scale for diagnosis, differential diagnosis, and treatment. Mean scores were compared using paired statistical tests and two-way ANOVA. <bold>Main Outcome Measures:</bold> Accuracy and completeness scores across diagnostic, differential diagnostic, and treatment tasks. <bold>Results:</bold> Across all cases, DS achieved significantly higher accuracy for diagnosis (8.04 vs 6.46, <italic>P</italic> &lt; 0.0001), differential diagnosis (7.52 vs 5.50, <italic>P</italic> &lt; 0.0001), and treatment (7.62 vs 6.65, <italic>P</italic> = 0.002) compared with GPT-4o. Completeness scores were also superior for DS in diagnosis (4.86 vs 3.69, <italic>P</italic> &lt; 0.0001), differential diagnosis (4.44 vs 3.24, <italic>P</italic> &lt; 0.0001), and treatment (4.61 vs 3.90, <italic>P</italic> = 0.0001). Subspecialty analyses revealed the largest advantage for DS in retinal diseases, glaucoma, strabismus &amp; amblyopia, and optic nerve disorders. <bold>Conclusions:</bold> In standardized ophthalmology case evaluations, DS outperformed GPT-4o in both accuracy and completeness, particularly in subspecialties requiring complex reasoning. These findings support the potential role of domain-optimized LLMs as adjuncts in ophthalmic education and clinical decision support, with further research warranted in multimodal and real-world clinical settings.

Ähnliche Arbeiten