The project covers the following:
- exploratory data analysis
- regression model
- ANOVA Analysis
- Model Diagnosis
- Check Constant Variance
- Check Influence Point (Delete Influence Point)
- Check Nomality
- Check Variance Inflation Factor
- Model Selection (Step BIC)
model = smf.ols("price~symboling+fueltype+aspiration+doornumber+carbody+drivewheel+enginelocation+wheelbase+carlength+carwidth+carheight+curbweight+enginetype+cylindernumber+enginesize+fuelsystem+boreratio+stroke+compressionratio+horsepower+peakrpm+citympg+highwaympg", data=train_data).fit()
table shows additional significant predictors: [aspiration, carlength, doornumber, drivewheel, fuelsystem, horsepower, symboling, wheelbase],ANOVA Typ3
will have the same result asANOVA Typ2
Both table show that [carbody, carwidth, curbweight, cylindernumber, enginelocation, enginesize, enginetype, fueltype, peakrpm, stroke] are the significant predictors of the fitted model.
Summary for VIF process, the predictors without violating the multicollinearity issue:
After removing the influence point, we fix the normality
issue and keep the model away from heteroscedasticity
We use Step-BIC
to find the best model using the updated dataset.
Final Selected Model:
price ~ enginesize + cylindernumber + enginetype + horsepower + carwidth + stroke + compressionratio + enginelocation + curbweight + peakrpm + carlength + doornumber