OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 13.03.2026, 00:57

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

ExCaPT: Explainable Cancer Prediction with Transformer-based models

2025·0 ZitationenOpen Access
Volltext beim Verlag öffnen

0

Zitationen

7

Autoren

2025

Jahr

Abstract

Abstract Cancer remains one of the most significant global health challenges. De-spite advances in treatment, early detection remains a critical concern. The increasing availability of Electronic Health Records (EHR) offers a unique opportunity to enhance our understanding of patient health trajectories and develop more accurate risk prediction models. However, the complexity and heterogeneity of EHR data pose significant challenges for analysis and modeling. Over the years, a range of models, from traditional machine learning to advanced deep learning (DL) approaches, have been employed to address the multidimensional complexities of health data. Notably, transformer-based models have emerged as a promising solution for capturing longitudinal, sequential, and multimodal data. This work introduces ExCaPT, a transformer encoder-based predictive model designed to identify individuals at higher risk of developing colorectal cancer (CRC), while providing interpretable outputs. The model leverages a comprehensive dataset, incorporating features such as age, sex, smoking status, and longitudinal EHR data including disease and drug trajectories. ExCaPT had good performance in a test dataset with a ROC-AUC of 85.9 ± 0.1, 68.1 ± 0.3 sensitivity and 82.2 ± 0.1 specificity. These results outperform those of an LSTM model, used as reference for sequence data. This highlights the potential of transformer-based models in the early identification of high-risk cancer patients, marking an important step forward in the field of precision healthcare. Additionally, we employed several explainability approaches, including attention-based, embedding-based, and integrated gradients analyses, which allowed us to identify key input features, visualize latent representations, and quantify the contributions of different features to the predictions, providing complementary insights into the model’s decision-making process. Highlights ExCaPT, a transformer-encoder model using demographic, disease, and medication trajectories from EHR data, predicts early colorectal cancer (CRC) risk with high performance. The model provides interpretability through attention scores, embedding-based analyses, and integrated gradients to reveal influential features. This modeling framework is generalizable and can be extended to pre-diction for other cancer types. GRAPHICAL ABSTRACT

Ähnliche Arbeiten