To view report: An Investigation Of The Framingham Heart Study
-
Purpose: This R markdown document displays my understanding of logistic regression and R. This report is part one of two articles describing Logit.
A. Logisitc Regression Report B. Under construction
-
Data: Framingham Heart Disease Study, FHS_data
-
Conclusion: We find seven (7) factors and their related odds leading to cardiovascular disease.
No. | Factors | Approximate Odds Over Mean |
---|---|---|
1 | Prevalence Of Stroke In Family History. | 240% |
2 | Male Vs Female | 150% |
3 | Prevalence Of Hypertension In Family History | 130% |
4 | Age | < 2,800% |
5 | Cigarettes Per Day | < 210% |
6 | Systolic Blood Pressure | < 780% |
7 | Glucose Levels | < 250% |
NOTES: I will use Logistic Regression to determine the Probability of Heart Disease and its factors.
-
Logistic Regression does not require a linear relationship between the independent and dependent variables.
-
The residuals from the model do not need to follow the normal distribution.
-
Logistic Regression does not require the assumption of homoscedasticity. Homoscedasticity means all the variables in the model have the same variance. So, the variables may have different variances in the Logistic Regression model.
-
The dependent variable in Logistic Regression is not measured on an interval or ratio scale.
-
Logistic Regression algorithm requires little or no multicollinearity among the independent variables. It means that the independent variables should not be too highly correlated with each other.
-
Logistic Regression model assumes linearity of independent variables and log odds.
-
The success of Logistic Regression model depends on the sample sizes. Typically, it requires a large sample size to achieve the high accuracy.