Comparative Analysis of Machine Learning Models and Key Risk Factors in Advancing Predictive Analytics for Coronary Heart Disease Over a Decade

Authors

  • Jingyang Gao Author

DOI:

https://doi.org/10.61173/fjx4w228

Keywords:

Coronary heart disease, Risk factors, Machine learning

Abstract

This study aimed to identify key risk factors for coronary heart disease (CHD) and assess the performance of various machine learning models in predicting 10-year CHD risk. We conducted an exploratory data analysis using data from the Framingham Heart Study, which included 4,238 participants and 15 potential risk factors, to understand the distribution of variables and relationships. We used Chi-square and Mann-Whitney U tests to identify significant associations between risk factors and coronary heart disease. Logistic regression, random forest and support vector machine (SVM) models were established and their prediction accuracy and area under receiver operating characteristic curve (AUC) were evaluated. The results showed that age, systolic blood pressure and history of hypertension were the most influential risk factors. Logistic regression accuracyand AUCwere the highest, better than random forest  and SVM. This indicates that we can pay more attention to such factors as age, systolic blood pressure and history of hypertension in subsequent CHD studies, and mainly use logistic regression model to predict coronary heart disease and optimize it.

Downloads

Published

2024-08-14

Issue

Section

Articles