Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Cardiovascular risk prediction via ensemble machine learning and oversampling methods
0
Zitationen
4
Autoren
2025
Jahr
Abstract
Cardiovascular diseases are a leading cause of global mortality, with hypertension, obesity, and other factors contributing significantly to risk. Artificial Intelligence has emerged as a valuable tool for early detection, offering predictive models that outperform traditional methods. This study analyzed a dataset of 709 individuals from Ecuador, including demographic and clinical variables, to estimate cardiovascular risk. During preprocessing, records with missing values and duplicates were removed, and highly correlated variables were excluded to reduce multicollinearity and prevent overfitting. The performance of several machine learning algorithms–including Decision Trees, Random Forest, Gradient Boosting, Extreme Gradient Boosting, LightGBM, Extra Trees, AdaBoost, and Bagging–was compared, while addressing class imbalance using SMOTE and a hybrid ROS–SMOTE approach. Gradient Boosting with the hybrid technique achieved the best performance, obtaining an accuracy of 0.87, a precision of 0.81, a recall of 0.74, and an F1-score of 0.75. Its superior performance is attributed to its sequential error correction mechanism and integrated regularization strategies, which effectively reduce overfitting and improve generalization in noisy or imbalanced datasets. These findings demonstrate the potential of AI-based models to improve early detection and management of cardiovascular disease, highlighting the importance of anthropometric, clinical, and blood pressure variables in predicting cardiovascular risk.
Ähnliche Arbeiten
Biostatistical Analysis
1996 · 35.445 Zit.
UCI Machine Learning Repository
2007 · 24.290 Zit.
An introduction to ROC analysis
2005 · 20.652 Zit.
The use of the area under the ROC curve in the evaluation of machine learning algorithms
1997 · 7.116 Zit.
A method of comparing the areas under receiver operating characteristic curves derived from the same cases.
1983 · 7.062 Zit.