Soroush Oskouei
University of Eastern Finland, Department of Applied Physics, Kuopio 70210, Finland
Title: Improving a CHD Prediction Model, Reducing Necessary Predictive Factors, and Statistical Analysis of the Data
Biography
Biography: Soroush Oskouei
Abstract
Coronary heart disease (CHD) occurs when the arteries of the heart cannot deliver enough oxygen-rich blood to the heart. In 2020, it was reported to be the cause of the majority of deaths in the United States. In this work, the most important risk factors for CHD are rearranged based on the classification model that can predict the ten year CHD risk with a very good acuracy. The data was taken from the cardiovascular study of Framingham available on Kaggle website. Considering the data used, there needed to be a multivariate analysis along with classified comparison and regressions to investigate possible correlations between features. Multivariate T-test and classifications could point out (and rule out) correlations and true effect of features on eachother and on the final prediction (ten year CHD risk). Factor Analysis and Principal Component Analysis (PCA) were also conducted to better reveal the importance of features. In order to make the prediction, three classification models were conducted along with a voting classifier. Different weighting factors were used for the voting classification and the one that resulted in the best accuracy was used. The three classification models were logistic regression, Gaussian naive Bayes, and random forest classification. The previous prediction model was improved by more than 5 percent in accuracy and the most important predictive factors were pointed out. The PCA on the data revealed that the most important factors are: total cholesterol, systolic blood pressure, heart rate, glucose, and combination of cigarettes per day and heart rate.