Amresh Giri (amresh.giri14@gmail.com)
- After undesrtanding the problem statement and going through the dataset, I understood that this is a binary classification problem.
- Did some basic data preprocessing like calculating for na values, missing values, checking types of variables, varibale stats, etc.
- After doing some basic analysis, I found that the target variable was hugely imbalanced.
- Converted the categorical values to numeric by encoding them.
- Selected the best features by using Recursive Feature Engineering.
- Tried different techniques like under-fitting, over-fitting, EasyEnsemble and SMOTE for balancing the target variable.
- Tried different classification algorithms with best balancing technique and best selected features.
- Combination of Gradient Boosting Classifier with its best params (through GridSearchCV) with SMOTE gave me the best F1-score.
- pandas==0.24.2
- sklearn==0.21.3
- xgboost==1.0.0
- imblearn==0.5.0
- Install the required packages and run cells in the Jupyter Notebook.
I had a rank of 485 out of 3740 registered participants (Top 12 percent).