Welcome to the Customer Churn Analysis project for Zanzibar Telecommunication! In this project, we dive into the world of data to help the telecommunication company enhance their profit and revenue margins by refining their customer retention strategies. We'll employ machine learning classification models to perform churn analysis on their customer datasets.
- Understand Classification Models and their applications.
- Assist the telecommunication company in comprehending their customer data.
- Calculate the lifetime value of each customer.
- Identify factors influencing customer churn rates.
- Predict if a customer is likely to churn.
- Data Exploration and Analysis.
- Handling Missing Data.
- Feature Engineering.
- Implementing Machine Learning Algorithms: Logistic Regression, Decision Trees, Support Vector Machine, Random Forest, etc.
- Model Evaluation and Interpretation using LIME and SHAP techniques.
- Model Optimization and Hyperparameter Tuning.
- Pandas, NumPy for data manipulation and cleaning.
- Matplotlib, Seaborn, Plotly for data visualization.
- Statsmodels, SciPy for statistical analysis.
- Scikit-learn for feature engineering and machine learning.
- Imbalanced-learn (SMOTE) for class imbalance.
- SHAP for interpreting model predictions.
- Pyodbc, dotenv for database access.
-
Load Datasets (Collect Initial Data):
- Import necessary libraries.
- Retrieve login credentials from .env and connect to the database.
- Load data into pandas DataFrame for manipulation and cleaning.
-
Exploratory Data Analysis (EDA):
- Check the first 5, last 5, and 10 random rows of the datasets.
- Examine the shape and information of the datasets.
- Check for duplicates and missing values.
- Perform descriptive analysis.
-
Data Quality Assessment:
- Identify key data quality issues.
- KNeighborsClassifier
- LinearRegression
- LogisticRegression
- AdaBoostClassifier
- DecisionTreeClassifier
- RandomForestClassifier
- GradientBoostingClassifier
- Support Vector Machine (SVC)
- GaussianNB
- Load and examine the dataset.
- Handle missing values through imputation and removing rows.
- Create or select relevant features that could impact customer churn.
- Scale or normalize numerical features.
- Encode categorical variables using techniques like one-hot encoding.
- Split the dataset into features (X) and target variable (y).
We consider eight (8) models for churn prediction:
- AdaBoostClassifier
- Logistic Regression
- Gaussian Naive Bayes
- Random Forest Classifier
- KNeighbors Classifier
- Decision Tree Classifier
- Gradient Boosting Classifier
- Support Vector Classifier (SVC)
We evaluate each model's performance using cross-validation to ensure reliable metrics. The key evaluation metrics used are accuracy for balanced datasets and F1-score for the unbalanced dataset.
Visualizing the confusion matrix for each model to gain insights into the performance and areas for improvement.
Fine-tuning the models' hyperparameters to achieve optimal performance. We utilize k-fold cross-validation to validate the model's generalizability.
Based on the results obtained from our evaluation, we predict future customer churn using the top-performing models. This insight can guide the business in taking proactive measures to retain valuable customers.
Your contributions are most welcome! Whether you're a data wizard or a curious soul, join our enchanting journey by contributing insights, optimizations, or enhancements.
This project is licensed under the GNU.2.0 License.
🔗 [Link to Article] [(https://www.linkedin.com/pulse/unveiling-power-collaboration-predicting-customer-churn-kofi-bamfo]
🔗 [Link to PowerBI Dashboard] [https://app.powerbi.com/groups/me/reports/655df298-36e9-4537-953c-37e410844b55/ReportSection?experience=power-bi]
| Edinam Doe Abla | https://github.com/doeabla |
| Enoch Taylor-Nketiah |https://github.com/akbamfo |
| Timothy Morenikeji Akinremi | https://github.com/timothyakinremi