Most Electrical Power stations are Fossil fuel based that emit gigaton carbon dioxide in the air.For a Sustainable world,the use of fossil fuels to produce electricity needs to change. Renewable energy is the only alternative to fossil fuels but it’s not reliable, affordable and desirable because of its unpredictable nature. If we are able to predict the nature of Renewable energy like wind energy using statistical learning algorithms, it can bring a revolution in the production of electricity. There are many factors involved in wind energy production like wind direction, speed, air temperature, and air density that need to be considered to have the highest electric power from a wind turbine. Statistical learning can help to fix those criteria to harness maximum power from a wind turbine. The main objective of this project is to use statistical learning algorithms to predict electrical power of a wind turbine.
The follwoing Research Questions I have tried to solve:
-
Is there a relationship between the wind speed (predictor) and the power (response) and How strong the relationship is?
-
Is the relationship between the predictor (wind speed) and the response (power) positive or negative?
-
What is the predicted power associated with a wind speed of 12? What are the associated 95% confidence and prediction intervals?
-
Is there any relationship between all predictors and the response?
-
Which predictors appear to have a statistically significant relationship to the response?
-
Is there evidence of non-linear association between any of the predictors and the response?
-
What is the condition of ML model fit,if some non-significant variables are dropped from the model?
-
Do any interactions appear to be statistically significant?
-
Is there any change after adding Interactions terms to the ML model?
-
Can we reduce the test error using the validation set approach, LOOCV, k-fold cross validation?
-
What is the value of k (k-fold) for minimum cross validation error?
-
Which are the most important and the least significant variables (features) in best subset methods?
-
Using adjusted R2, what is the number of variables that best subset methods choose?
-
Using Cp, what is the number of variables that best subset methods choose?
-
Using BIC, what is the number of variables that best subset methods choose?
-
What are the 7 variables that best subset methods choose?
-
Which are the most important and the least significant variables (features) in forward stepwise selection methods?
-
Which are the most important and the least significant variables (features) in backward stepwise selection methods?
-
Do all methods (best, forward, backward) select the same features (say first 7 features)?
-
What does coefficient path display in ridge and lasso regression?
-
how do you choose the tuning parameter and what is the value of tuning parameter in ridge regression?
-
How many components should I select for the modeling stage in PCR and PLS?
-
Is polynomial regression of wind speed with all degrees statistically significant?
-
How do one decide the degree of freedom in polynomial regression?
-
Are all knots statistically significant in spline analysis?
-
Does GAM model with natural spline show statistically significant for higher degree of freedom?
-
Which is better, GAM with smoothing spline or GAM with natural spline?
-
What is the tree size in a regression tree?
-
How do we know how far back to prune the tree?
-
How many predictors are used in bagging?
-
What is the minimum threshold value of wind speed to produce electrical power?
-
What is the effect of outdoor temperature on producing wind turbine power?
-
Does Boosting model performance improve after tuning the shrinkage parameter?
-
How do I select the regularization parameter or budget controls parameter?
-
What is the value of k in the knn regression model?
-
Which model is the best for the above analysis?