Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Can Open-Source Large Language Models Detect Medical Errors in Real-World Ophthalmology Reports?
0
Zitationen
12
Autoren
2025
Jahr
Abstract
Accurate documentation is critical in ophthalmology, yet clinical notes often contain subtle errors that can affect decision-making. This study prospectively compared contemporary large language models (LLMs) for detecting clinically salient errors in emergency ophthalmology encounter notes and generating actionable corrections. 129 de-identified notes, each seeded with a predefined target error, were independently audited by four LLMs (o3 (OpenAI, closed-source), DeepSeek-v3-r1 (Deepseek, open-source), MedGemma-27B (Google, open-source), and GPT-4o (OpenAI, closed-source)) using a standardized prompt. Two masked ophthalmologists graded error localization, relevance of additional issues, and overall recommendation quality, with within-case analyses applying appropriate nonparametric tests. Performance varied significantly across models (Cochran’s Q = 71.13, p = 2.44 × 10−15). o3 achieved the highest error localization accuracy at 95.7% (95% CI, 89.5–98.8), followed by DeepSeek-v3-r1 (90.3%), MedGemma-27b (80.9%), and GPT-4o (53.2%). Ordinal outcomes similarly favored o3 and DeepSeek-v3-r1 (both p < 10−9 vs. GPT-4o), with mean recommendation quality scores of 3.35, 3.05, 2.54, and 2.11, respectively. These findings demonstrate that LLMs can serve as accurate “second-eyes” for ophthalmology documentation. A proprietary model led on all metrics, while a strong open-source alternative approached its performance, offering potential for privacy-preserving on-premise deployment. Clinical translation will require oversight, workflow integration, and careful attention to ethical considerations.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.200 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.051 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.416 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.410 Zit.