(Prediction of driver churn using machine learning techniques)
This project focuses on predicting driver churn for Ola, based on various driver attributes such as age, gender, education level, income, quarterly ratings, and total business value. The goal is to predict whether a driver will leave the company and explore the factors that contribute to their retention or churn.
The objective is to:
- Analyze driver data to understand the factors influencing their performance and churn.
- Predict driver churn using ensemble models.
- Identify actionable insights that Ola can use to improve driver retention.
-
Exploratory Data Analysis (EDA):
- Perform univariate and bivariate analysis on key variables like Age, Income, Total Business Value, Quarterly Rating, etc.
- Visualize the data through scatter plots, box plots, and correlation heatmaps.
-
Data Preprocessing:
- Handle missing values using KNN Imputation.
- Perform feature engineering, class balancing, and encoding to prepare the data for machine learning models.
-
Model Building:
- Build predictive models using ensemble methods like Random Forest and XGBoost.
-
Results Evaluation:
- Evaluate models using confusion matrix, ROC-AUC curve, and classification report.
- Loaded the dataset and performed initial checks.
- Univariate analysis of continuous and categorical variables.
- Bivariate analysis, observing correlations between key variables.
- KNN Imputation to handle missing values.
- Feature Engineering to create new variables for better prediction.
- Address class imbalance using SMOTE.
- Standardize numerical data and one-hot encode categorical variables.
- Build ensemble models for accurate prediction.
The dataset includes driver data from 2019 to 2020 with features like:
- Driver_ID
- Age
- Gender
- City
- Income
- Quarterly Rating
- And more...
To run this project locally:
-
Clone the Repository:
(https://github.com/Srinivaskoruprolu007/OLA-Ensemble-Learning.git)
-
Install Dependencies:
pip install -r requirements.txt
-
Run the Notebook: Open the Jupyter notebook to follow the steps for EDA and preprocessing.
- Language: Python
- Libraries: pandas, scikit-learn, XGBoost, matplotlib
For inquiries or further discussion:
- Name: Srinivas Koruprolu
- Email: srinivasg3112@gmail.com
- LinkedIn: Srinivas LinkedIn