This repo contains a benchmark and sample code in Python for the Expedia Personalized Sort Competition, a machine learning challenged hosted by Kaggle in conjunction with Expedia.
It also contains the transformation code used to create the competition data files from the raw data in the CreateCompetitionData directory. This code is provided for your information only (and does not need to be looked at or run by competition participants).
This version of the repo contains the Basic Python Benchmark. Future benchmarks may be included here as well and will be marked with git tags.
This benchmark is intended to provide a simple example of reading the data and creating the submission file, not as a state of the art benchmark on this problem.
Executing this benchmark requires Python 2.7, along with the Python package sklearn version 0.13, and pandas version 0.10.1 (other versions may work, but this has not been tested).
To run the benchmark,
- Download data.zip from the competition page. This contains the dataset as two csv files, train.csv and test.csv.
- Switch to the "PythonBenchmark" directory
- Modify SETTINGS.json to include the paths to the data files, as well as a place to save the trained model and a place to save the submission
- Train the model by running
python train.py
- Make predictions on the validation set by running
python predict.py
- Make a submission with the output file
This benchmark took less than 5 minutes to execute on a Windows 8 laptop with 8GB of RAM and 4 cores at 2.7GHz.