Code for The Analytics Edge (15.071x) competition.
amelia - impute missing values, train many models, predict, bag
validation.r - split the training set for validation, train and score random forest and naive Bayes,
plot variable importance from random forest
vectorize_and_predict_inplace.py - convert categorical to -1/0/1, train, write predictions
vectorize_validation.py - convert data to numbers only, train, get validation score
Get 0.74568 public / 0.77761 private AUC with vectorize_and_predict_inplace.py
and even better score with Amelia.
For description, see: