Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

The next generation of evidence synthesis for diagnostic accuracy studies in artificial intelligence

2024·4 Zitationen·The Lancet Digital HealthOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2024

Jahr

Abstract

The present landscape of artificial intelligence (AI) in clinical diagnostics is dominated by efforts to establish the diagnostic accuracy of AI tools in applications such as imaging and pathology. However, as proprietary AI systems continue to be deployed and refined, a more important consideration will be differentiating between comparable AI tools to offer greater diagnostic capability and improve clinical decision making. The confluence of AI technologies in comparable use cases mandates the development of bespoke quality assessment tools to evaluate the risk of bias, the use of index tests and reference standards with appropriate thresholds, and applicability in diagnostic accuracy studies. Such an approach could inform real-world clinical utility and, subsequently, health policy decisions. This notion of differentiating between AI tools is particularly important as AI-enabled medical devices continue to be approved by regulatory bodies, with several hundred devices currently authorised by the US Food and Drug Administration (FDA).1US Food and Drug Administration Artificial intelligence and machine learning (AI/ML)-enabled medical devices..https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devicesDate: 2023Date accessed: November 15, 2023Google Scholar Most devices are designed to aid the detection of lesions or abnormalities in various diagnostic applications, predominantly in radiology. For example, a number of FDA-approved lesion detection devices for screening mammograms use distinct AI technologies, including ProFound AI Software (iCAD, USA), Transpara (ScreenPoint Medical, Netherlands) and INSIGHT MMG (Lunit, South Korea).1US Food and Drug Administration Artificial intelligence and machine learning (AI/ML)-enabled medical devices..https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devicesDate: 2023Date accessed: November 15, 2023Google Scholar Multiple large independent trials are currently ongoing to appraise AI tools in screening mammograms, reporting positive preliminary results.2Dembrower K Crippa A Colón E Eklund M Strand F Artificial intelligence for breast cancer detection in screening mammography in Sweden: a prospective, population-based, paired-reader, non-inferiority study.Lancet Digit Health. 2023; 5: e703-e711Summary Full Text Full Text PDF PubMed Google Scholar, 3Lång K Josefsson V Larsson A-M et al.Artificial intelligence-supported screen reading versus standard double reading in the Mammography Screening with Artificial Intelligence trial (MASAI): a clinical safety analysis of a randomised, controlled, non-inferiority, single-blinded, screening accuracy study.Lancet Oncol. 2023; 24: 936-944Summary Full Text Full Text PDF PubMed Scopus (102) Google Scholar Promising findings have also been shown in other large trials, such as a trial assessing AI use in electrocardiogram interpretation.4Yao X Rushlow DR Inselman JW et al.Artificial intelligence-enabled electrocardiograms for identification of patients with low ejection fraction: a pragmatic, randomized clinical trial.Nat Med. 2021; 27: 815-819Crossref PubMed Scopus (162) Google Scholar Once conclusive findings from all of these studies are published, the question becomes not whether AI can be integrated into clinical diagnostic workflows, but which AI device is most clinically useful for a particular use case or population cohort. The answer will depend on a number of different factors most effectively characterised through rigorously conducted evidence synthesis strategies. However, currently published systematic reviews of AI diagnostic accuracy do not consistently report quality assessment standards.5Jayakumar S Sounderajah V Normahani P et al.Quality assessment standards in artificial intelligence diagnostic accuracy systematic reviews: a meta-research study.NPJ Digit Med. 2022; 5: 11Crossref PubMed Scopus (34) Google Scholar Furthermore, the majority of systematic reviews use the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool,6Whiting PF Rutjes AW Westwood ME et al.QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies.Ann Intern Med. 2011; 155: 529-536Crossref PubMed Google Scholar which does not fully capture biases unique to AI technology. We have previously described potential biases relevant to AI diagnostic accuracy studies, including the use of large-scale open-source repositories; the paucity of external validation of index tests and inadequate explanation of their training, algorithmic details, and test sets; the use of inappropriate and inconsistent reference standards; and the unclear reporting of timing between index tests and reference standards.5Jayakumar S Sounderajah V Normahani P et al.Quality assessment standards in artificial intelligence diagnostic accuracy systematic reviews: a meta-research study.NPJ Digit Med. 2022; 5: 11Crossref PubMed Scopus (34) Google Scholar, 7Sounderajah V Ashrafian H Rose S et al.A quality assessment tool for artificial intelligence-centered diagnostic test accuracy studies: QUADAS-AI.Nat Med. 2021; 27: 1663-1665Crossref PubMed Scopus (76) Google Scholar The challenge for regulatory bodies is to interpret the safety and efficacy of devices in the context of these biases. To streamline the clearance of clinically useful devices, the majority of recent FDA approvals of AI devices have been via the 510(k) pathway, which facilitates clearance if the devices are substantially similar to a previously approved device (termed the predicate). Separately, the FDA-proposed regulatory framework for modifications to AI/machine learning software as a medical device8US Food and Drug Administration Proposed regulatory framework for modifications to artificial intelligence/machine learning (AI/ML)-based software as a medical device (SaMD)..https://www.fda.gov/files/medical%20devices/published/US-FDA-Artificial-Intelligence-and-Machine-Learning-Discussion-Paper.pdfDate: 2019Date accessed: November 15, 2023Google Scholar highlights the need to allow adaptive technologies to learn and improve in real time, harnessing the inherent nature of AI devices to gradually improve through iteration. The UK Medicines and Healthcare products Regulatory Agency and European Medicines Agency are yet to produce detailed guidance on regulatory frameworks for AI health devices but broadly aim to foster innovation in line with the USA.9UK Medicines and Healthcare Products Regulatory Agency Software and AI as a medical device change programme—roadmap..https://www.gov.uk/government/publications/software-and-ai-as-a-medical-device-change-programme/software-and-ai-as-a-medical-device-change-programme-roadmapDate: 2023Date accessed: November 16, 2023Google Scholar, 10European Medicines Agency EMA regulatory science to 2025: strategic reflection..https://www.ema.europa.eu/en/documents/regulatory-procedural-guideline/ema-regulatory-science-2025-strategic-reflection_en.pdfDate: 2020Date accessed: November 16, 2023Google Scholar However, there are several issues with relying on predicates to approve future iterations or next-generation products: the use of distinct data sources and training sets to validate algorithms; the impact of data drift that degrades model accuracy over time; the mirroring and entrenching of existing biases that might be population-wide and unrepresentative; and differences in clinical applicability, implementation, and interoperability. Without an appreciation of these factors and how they affect the validity of reported outcomes, the ability to assess clinical utility is hindered; in effect, regulatory bodies and health policy makers will not be able to confidently conclude that these tools can be safely deployed into clinical practice. To facilitate the next generation of evidence synthesis for AI diagnostic studies in the face of these challenges, our group has commenced development of an extension to QUADAS-2 for specific use in AI diagnostic accuracy studies, named QUADAS-AI. Importantly, this tool will be developed through an international consensus on key biases that could limit translation into clinical workflows. The role of evidence appraisal in AI diagnostics will increasingly rely on integrating data from multiple iterations of a wide range of heterogeneous AI tools. Subsequently, the need for robust and transparent evidence synthesis through tools such as QUADAS-AI will be crucial for the quality, safety, and value of adopted clinical diagnostic tools. AD is chair for the Preemptive Medicine and Health Security Initiative at Flagship Pioneering. HA is chief scientific officer of Preemptive Health and Medicine, Flagship Pioneering. PW and AG declare no competing interests.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMachine Learning in HealthcareExplainable Artificial Intelligence (XAI)

Volltext beim Verlag öffnen

The next generation of evidence synthesis for diagnostic accuracy studies in artificial intelligence

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen