As a college student, you want to focus on real-life business problems and try to find solutions using Machine Learning. he objective is to predict the customers who are likely to churn, so that the company can take necessary actions to retain them. In this project, we will implement and compare three machine learning models: Random Forest, XGBoost, and KNeighbors on a Telco Churn Dataset.
Models:
Random Forest: The Random Forest model is an ensemble learning algorithm that uses multiple decision trees to make a prediction. It creates many decision trees and selects the best performing one based on the performance metrics.
XGBoost: The XGBoost is an optimized gradient boosting algorithm that is used for classification and regression problems. In this project, we will use accuracy and precision as the performance metrics and tune the hyperparameters using Random Search.
KNeighbors: The KNeighbors model is a simple, non-parametric, lazy learning algorithm that classifies a new data point based on the nearest neighbors.
Evaluation Metrics:
To evaluate the performance of the models, we will use accuracy and precision scores and their respective confusion matrices. The accuracy score measures the overall performance of the model, while the precision score measures the proportion of true positive predictions out of all positive predictions. The confusion matrix summarizes the number of true positive, true negative, false positive, and false negative predictions.
Conclusion:
The results of the three models on the Telco Churn Dataset showed that the Random Forest model performed the best, with an accuracy score of 95.6% and a precision score of 96.2%. The XGBoost model had an accuracy score of 95.4% and a precision score of 92.0%, and the KNeighbors model had an accuracy score of 89.2% and a precision score of 84.6%. Based on the results, it is recommended to use the Random Forest model for the Telco Churn problem.