Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Real-world validation of artificial intelligence algorithms for ophthalmic imaging

2021·18 Zitationen·The Lancet Digital HealthOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2021

Jahr

Abstract

With the explosion of artificial intelligence (AI) algorithms in medical imaging, the lifecycle of AI development is well accepted and includes training, internal validation, and external validation. Although training and internal validation can be done in retrospective datasets, that testing be done using independent data is imperative. These requirements overcome the basic issue of model overfitting, wherein the algorithm tends to perform well within the training data environment, but the accuracy is not sustained in the testing stage. Validation data for medical imaging studies need to be representative of the target population and consider various factors, such as geographical factors, temporal factors, disease prevalence, racial and gender diversity, camera systems, image specifications, and acquisition specifications. Despite using independent datasets for validation, a deep-learning diabetic-retinopathy screening model developed by Google Health faced challenges when deployed in a clinic workflow.1Gulshan V Peng L Coram M et al.Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs.JAMA. 2016; 316: 2402-2410Crossref PubMed Scopus (2685) Google Scholar, 2Beede E, Baylor E, Hersch F, et al. A human-centered evaluation of a deep learning system deployed in clinics for the detection of diabetic retinopathy. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems; Honolulu, HI, USA; 2020.Google Scholar After some algorithms have been tested on a real-world dataset After some algorithms have been tested on a real-world dataset, they have been found to have bias in performance with variable accuracy depending on pigmentation.3Burlina P Joshi N Paul W et al.Addressing artificial intelligence bias in retinal diagnostics.Transl Vis Sci Technol. 2021; 10: 13Crossref PubMed Scopus (4) Google Scholar The US Food and Drug Administration (FDA) requires AI testing to use prospective data for final model testing, as was the case with two FDA-approved AI models for screening diabetic retinopathy.4Abràmoff MD Lavin PT Birch M et al.Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices.NPJ Digit Med. 2018; 1: 39Crossref PubMed Google Scholar, 5EYENUKEyenuk announces FDA clearance for EyeArt autonomous AI system for diabetic retinopathy screening.https://www.eyenuk.com/us-en/articles/diabetic-retinopathy/eyenuk-announces-eyeart-fda-clearance/Date accessed: June 24, 2021Google Scholar There is a dearth of clinical studies rigorously validating algorithms in the real world, and more well designed prospective studies, ideally with a locked algorithm, are needed to truly understand the strengths and drawbacks of using a given algorithm in clinical practice.6Nagendran M Chen Y Lovejoy CA et al.Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies.BMJ. 2020; 368: m689Crossref PubMed Scopus (196) Google Scholar Several state-of-the-art automated diabetic-retinopathy screening systems were compared head to head in a real world dataset and showed performance differences with sensitivities ranging between 50·98% and 85·90%.7Lee AY Yanagihara RT Lee CS et al.Multicenter, head-to-head, real-world validation study of seven automated artificial intelligence diabetic retinopathy screening systems.Diabetes Care. 2021; 441168Crossref PubMed Scopus (15) Google Scholar This emphasised the need for multiple levels of validation, diversity of datasets, and careful consideration regarding the reference standards with which AI output is being compared. The Article by Duoru Lin and coworkers8Lin D Xiong J Liu C et al.Application of Comprehensive Artificial intelligence Retinal Expert (CARE) system: a national real-world evidence study.Lancet Digit Health. 2021; 3: e486-e495Summary Full Text Full Text PDF PubMed Scopus (3) Google Scholar in The Lancet Digital Health features a deep-learning system called the Comprehensive AI Retinal Expert (CARE) system, a single convolutional neural network that was trained to identify 14 common retinal abnormalities and normal fundus. The training dataset was methodologically planned to include labelling by ophthalmologists and retinal experts with varying yearsof experience. All ophthalmologists were trained to use standardised labelling and grading was done using a triple read and arbitration method. More than 200 000 fundus images were obtained from 16 clinical settings across China, including tertiary hospitals, community hospitals, and physical examination centers. The system was comprehensively validated on 18 136 prospectively collected images in China from 35 sites, again covering a range of settings. Performance of CARE was compared with nine ophthalmologists with experience with fundus disease as a reference standard and four groups of real-world ophthalmologists with a range of experiences. In addition, the model performance was tested in imaging data from participants with non-Chinese ethnicities and in images from different cameras. Mean accuracy of CARE in the external dataset was 0·968 (0·037), and similar to that of ophthalmologists and across ethnicities. The multidisease model was also compared with single-disease-labelled binary models and found to have better accuracy (mean AUC of 0·952, SD 0·047 vs 0·921, SD 0·087). The authors explained that including all disease labels into one single network enables the AI model to learn disease correlation and diagnostic logic, including for conditions such as drusen, neovascularisation, and geographic atrophy. They concluded that the model performance was satisfactory and could be implemented in clinical care. Although real-world validation on prospective diverse data appears to be the best possible way to test AI models, implementation into the clinical workflow is a separate and important factor to evaluate.9He J Baxter SL Xu J et al.The practical implementation of artificial intelligence technologies in medicine.Nat Med. 2019; 25: 30-36Crossref PubMed Scopus (385) Google Scholar In the study by Lin and coworkers,8Lin D Xiong J Liu C et al.Application of Comprehensive Artificial intelligence Retinal Expert (CARE) system: a national real-world evidence study.Lancet Digit Health. 2021; 3: e486-e495Summary Full Text Full Text PDF PubMed Scopus (3) Google Scholar the authors do not discuss the purpose of the algorithm, which could range from screening in primary care to triaging in specialty clinics. Despite real-world data validation, there are other factors in the clinical workflow, such as disease spectrum and prevalence, that can affect the performance of an AI algorithm, and implementation validation will need to be considered as a separate step. In most AI model development, the focus is on metrics that can establish model accuracy for disease or abnormality classification. Quality assessment is integral to classification systems based on medical imaging at all stages of AI development. In this study, an image quality-control model was not applied during training, but introduced as a separate workflow during the external validation. In addition, a so-called uncertain category needs to be implemented, particularly as the classifiers evolve from single disease to several diseases.10Kompa B Snoek J Beam AL Second opinion needed: communicating uncertainty in medical machine learning.NPJ Digit Med. 2021; 4: 4Crossref PubMed Scopus (25) Google Scholar Model accuracy does not necessarily translate to improvement in patient outcomes. With automated diabetic-retinopathy screening algorithms, the burden placed on teleophthalmology health-care staff can be relieved. However, improvement in patient outcomes and reduction in diabetic retinopathy-related vision loss is yet to be established. Development of a retinal image screening algorithm that recognises 14 abnormalities does not imply that the patient reaches an ophthalmologist when needed, or that the algorithm will work as well in a given clinical workflow, but it is a step in the right direction. We declare no competing interests. Application of Comprehensive Artificial intelligence Retinal Expert (CARE) system: a national real-world evidence studyOur DLS (CARE) showed satisfactory performance for screening multiple retinal abnormalities in real-world settings using prospectively collected fundus photographs, and so could allow the system to be implemented and adopted for clinical care. Full-Text PDF Open Access

Autoren

Institutionen

University of Wisconsin–Madison(US)

Themen

Artificial Intelligence in Healthcare and EducationRetinal Imaging and AnalysisMedical Imaging and Analysis

Volltext beim Verlag öffnen

Real-world validation of artificial intelligence algorithms for ophthalmic imaging

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen