Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Data-centric Artificial Intelligence and Cancer Research: Construction of a Real-World Head and Neck Treatment Data Repository
0
Zitationen
18
Autoren
2025
Jahr
Abstract
Abstract Background and Purpose The performance and generalisability of machine learning (ML) models relies on high-quality data. Retrospective and prospective collection of high-quality data for research use whilst respecting data protection and patient privacy remains a challenge in the clinical environment. Currently, months of laborious extraction and clinical annotation are often necessary before data analysis can begin. We present a novel institutional federated data lake, utilising open-source software, to facilitate efficient production of ML models from Head and Neck Cancer (HNC) imaging and Radiotherapy (RT) data. This structured pipeline dramatically reduces the time associated with the production of ML models and real-world evidence generation. This paper describes our governance-compliant processes and provides a framework for establishing similar databases. Materials and Methods XNAT, is a powerful open-source imaging platform. Within our department, it forms a part of the local secure enclave for the purposes of federated learning in artificial intelligence projects and provides import, archiving, processing, search and secure distribution facilities for imaging and RT data. Results We have created a clinically annotated, carefully curated, data lake of 2,895 consenting HNC patients containing 22,170 relevant diagnostic, staging, treatment and monitoring imaging sets. Key recommendations for replication include infrastructure planning, robust patient and data selection criteria and prioritising patient consent and privacy. Conclusions This secure and extensible imaging and HNC RT cancer database set-up promises to be an exceedingly useful tool for research, revolutionising the time and cost associated with the production of ML models, making the process safer, faster and more efficient. Highlights Real-world data is critical for building predictive clinical models. Curation of clinical data for machine learning can necessitate months of laborious extraction and annotation. Patients have the right to have their data handled with the highest standards of respect, security and governance. We describe a database infrastructure constructed under a transparent and safe system of control and stewardship. This secure and extensible structured database dramatically reduces data extraction time for AI-driven cancer research.
Ähnliche Arbeiten
New response evaluation criteria in solid tumours: Revised RECIST guideline (version 1.1)
2008 · 28.834 Zit.
TNM Classification of Malignant Tumours
1987 · 16.123 Zit.
A survey on deep learning in medical image analysis
2017 · 13.528 Zit.
Reduced Lung-Cancer Mortality with Low-Dose Computed Tomographic Screening
2011 · 10.749 Zit.
The American Joint Committee on Cancer: the 7th Edition of the AJCC Cancer Staging Manual and the Future of TNM
2010 · 9.104 Zit.