Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Automated Task-Specific vs General-Purpose Artificial Intelligence for Detecting Subtle Intraoperative Warning Signs During Cataract Surgery: A Multicenter Diagnostic Study
0
Zitationen
13
Autoren
2026
Jahr
Abstract
Abstract Importance Early intraoperative warning signs of zonular instability during cataract surgery, such as anterior capsular radial folds, are subtle and easily missed but are clinically important for preventing surgical complications. Whether current artificial intelligence (AI) systems can reliably detect such subtle warning signs in real-world surgical video remains unknown. Recently, automated AI model generators have become available, enabling the automatic construction of task-specific AI models for individual clinical tasks. Objective To evaluate the diagnostic performance of general-purpose and automated task-specific artificial intelligence systems for detecting anterior capsular radial folds during cataract surgery and to compare their performance with human clinicians. Design, Setting, and Participants This retrospective diagnostic study used 537 continuous curvilinear capsulorhexis (CCC) video clips collected from Beijing Tongren Hospital (China), National University Hospital (Singapore), and the OphNet-APTOS public dataset. Exposure Presence or absence of anterior capsular radial folds during CCC, annotated at both clip and frame levels by senior glaucoma surgeons based on expert consensus. Main Outcomes and Measures Discrimination between fold-positive and fold-negative cases was assessed using macro-averaged precision, recall, and F1 score at the frame and clip levels. Performance was compared among general-purpose AI systems, task-specific models generated by an automated AI model generator, and human graders with different levels of clinical experience. Results Among 537 video clips (mean 7.32 seconds), 156 (29.1%) were fold-positive. General-purpose AI systems showed limited and inconsistent performance; the best-performing model achieved a mean F1 score of 0.519, and fine-tuned models remained inferior to human graders (maximum F1 score, 0.606). In contrast, task-specific models generated by an automated AI model generator achieved substantially higher performance (F1 score, 0.869; area under the receiver operating characteristic curve, 0.958). In head-to-head comparison with clinicians, the top automated task-specific model (F1 score, 0.835) matched the performance of junior specialists (mean F1 score, 0.829) but remained below that of senior specialists. Conclusions and Relevance General-purpose artificial intelligence systems do not reliably detect subtle intraoperative warning signs during cataract surgery and consistently underperform human clinicians. In contrast, recently available automated AI model generators enable the creation of task-specific models with near–junior specialist performance. These findings suggest that clinically reliable surgical AI is more likely to be achieved through automated generation of task-specific models rather than through general-purpose AI systems. Although evaluated in cataract surgery, these findings highlight a broader challenge for artificial intelligence in detecting brief, low-contrast intraoperative warning signs in surgical video. Key Points Question How reliably can general-purpose artificial intelligence (AI) systems and task-specific AI models generated by an automated AI model generator detect subtle intraoperative warning signs during cataract surgery compared with human clinicians? Findings In this multicenter diagnostic study of 537 cataract surgery video clips, general-purpose AI systems were unreliable and consistently underperformed human clinicians in detecting anterior capsular radial folds. In contrast, task-specific AI models generated by an automated AI model generator—a technology that has only recently become available—achieved substantially higher diagnostic performance and matched the performance of junior specialists. Meaning General-purpose AI systems show limited reliability for detecting subtle intraoperative warning signs during cataract surgery. The recent availability of automated AI model generators enables a new paradigm of task-specific model development and represents a more clinically viable path for surgical decision support.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.200 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.051 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.416 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.410 Zit.