This project is about hotel booking cancellation using a dataset available on Kaggle. dataset link:https://www.kaggle.com/datasets/jessemostipak/hotel-booking-demand
The dataset contains 32 different features, we did a thorough exploratory data analysis, analyzing their distribution and the types of distributions. Then we identified missing values, outliers if any and used the required strategies to handle them. After that we separated numerical and categorical variables, to perform standardization, encoding etc.
We used six different algorithms with hyperparameter tuning for classification: SGD classifier, KNN, XGB Classifier, Decision tree classifier, and random forest classifier, and we used f1_score and accuracy score to evaluate the model performance, and were able to obtain accuracy of 94% on our best model.