This was a Machine learning kaggle competion conducted by IITMadras. This code file managed to earn me a top 80 rank
The data is collected through a survey to understand the driver's behavior regarding their preference for discount/offer for dining/takeaway. The researcher collected these data by providing different scenarios to various users.
Example Scenario: You are driving from IIT Madras to Chennai Airport along with your family and you get an offer (10 percent discount on the bill) from the famous Chinese restaurant in Guindy. Will you avail of the offer while traveling?
Along with the user response, some basic information about the users is collected.
- train.csv - the training set
- test.csv - the test set
- sample_submission.csv - a sample submission file in the correct format
- data_dictionary = {'offer expiration':'Number of days offer is valid,
- 'income_range': 'income range',
- 'no_visited_Cold drinks':'Number of times visited cold drinks',
- 'travelled_more_than_15mins_for_offer':'Have you traveled more than 15 mins to avail an offer?',
- 'Restaur_spend_less_than20': 'Number of times spend less than 20 dollar in restaurant,
- 'Marital Status': 'Marital status',
- 'restaurant type': 'type of restaurant',
- 'age': 'age',
- 'Prefer western over Chinese: 'Do you Prefer western over chinese'',
- 'travelled_more_than_25mins_for_offer': 'Have you traveled more than 25 mins to avail an offer?',
- 'travelled_more_than_5mins_for_offer': 'Have you traveled more than 5 mins to avail an offer?',
- 'no_visited_bars': 'Number of times visited bar',
- 'gender': 'gender',
- 'car': 'type of vehicle do you use in your own words',
- 'restuarant_same_direction_house',
- 'Cooks regularly': 'Do you cook regularly?',
- 'Customer type': 'Whom do you prefer to go with?',
- 'Qualification': 'Qualification',
- 'is foodie': 'Is foodie',
- 'no_Take-aways': 'Number of times opted for take-away',
- 'Job/Job Industry': 'Type of industry you work with',
- 'restuarant_opposite_direction_house': 'is the coupon offered restaurant located opposite direction to your house?',
- 'has Children':'Do you have children?',
- 'visit restaurant with rating (avg)': 'average rating of the restaurant which gave offer',
- 'temperature':'current temperature',
- 'Restaur_spend_greater_than20': 'Number of times spend greater than 20 dollar in restuarant',
- 'Travel Time': 'travel time for the restaurant which gave offer',
- 'Climate': 'current climate',
- 'drop location': 'where are you heading to?',
- 'Prefer home food': 'Do you prefer home food?',
- 'Offer Accepted': ' Did you accept the offer?'}
- Target Variable: Offer Accepted
To predict whether a prospective consumer will accept the offer
- My first attempt was to understand the problem, the features we have and there contribution, there missing values, any correlations, type of features.
- Then I preprocessed the data and imputed the missing values
- After cleaning the data and imputing the missing values, scaling numerical values and converting ordinal and cardinal data to numerical values
- Post that I divided the dataset into a train set and test set and created a baseline model to set benchmark for the model
- Then I tried different classifiers
- knn
- random forest
- ensemble methods
- Bagging
- Boosting
- AdaBoost
- GradientBoosting
- XG Boost
- The best accuracy was achieved using XGBoost with RFE(Recursive Feature Elimination) In the end I achieved an accuracy of 0.6311