In this project, we have used the MovieLens 100k dataset to compare different algorithms for rating prediction, and also create a movie recommendation system on top of it. Surprise is used to develop the models, and the dataset itself is open for public use since 1998, and has 100,000 ratings from 943 users on 1682 movies. Each user has rated at least 20 movies, and simple demographic info for the users (age, gender, occupation, zip) are given.
- Data preprocessing and split, create a training dataset and a testing dataset for experiment
- Rating prediction, develop an algorithm to predict the ratings in the testing set based on the information (ratings and others) in the training dataset, and evaluate the predictions based on MAE and RMSE.
- Item Recommendation, construct a recommendation list for each user, and then evaluate the recommendation quality based on precision, recall, and F-measure.
├── data # Data used for the project
├── Documents # Holds documents realted to the project
├── README.md # Read this first
└── code.ipynb # The code used for building the application
- Open code.ipynb in you jupyter installation.
- Run all the cells. The code inside will handle installing the required packages.
- If needed, you can easily modify the last cell to get recommendations for specific users.
RMSE and MAE scores | Precision, Recall, and F-score |
---|---|