-
Notifications
You must be signed in to change notification settings - Fork 0
Kjosev/Recommendation-engine
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
################################################################### ############################# README ############################# ################################################################### ################ PROJECT 2 - RECOMMENDER SYSTEM ################ ################################################################### ########################## TEAM #NoChill ########################### ################################################################### Explanation of abbreviations (see report for details): ALS = alternating least squares SGD = stochastic gradient descent AVG = smart average GLBAVG = global average USRAVG = user average MVIAVG = item average CFI = item-based collaborative filtering CFU = user-based collaborative filtering 1) To reproduce the result of the final submission, just run command ‘python run.py’. It combines the submissions generated with ALS, SGD, AVG, GLBAVG, USRAVG, MVIAVG, CFI, CFU; finds the best weights for blending them and then creates a submission from all. Each individual submission can be generated by running the appropriate script 'run_[method].py'. If submission does not exist 'run.py' automatically runs the script to generate it before combining. In order to have a fast way to generate the final submission, we have provided a precomputed submission for CFU, as it takes quite a long time to be computed without the precomputed similarity matrix (that would have been too big to be submitted). For CFI we use precomputed similarity matrix ("movieSim.obj"). For ALS and SGD we provide precomputed user and item features. The code will still work without these files but it will take a long time since it will have to retrain everything. 2) All the submissions needed for ‘run.py’ are created automatically by the code if not present in the folder. To create them manually, the following scripts should be run: - run_ALS.py (~ 10 min) - run_SGD.py (~ 1 h) - run_AVG.py (fast) - run_GLBAVG.py (fast) - run_MVIAVG.py (fast) - run_USRAVG.py (fast) - run_CFI.py (~ 2-3 min) - run_CFU.py (~ 1-2 h) with no precomputed similarity 3) Since the final submission is generated by blending submissions from multiple methods, we need to find the weights with which each method contributes. This is done by 'blend.py'. It creates a joint dataset from the method outputs in 'data/methods' folder combined with the true values. Then it runs least squares on that dataset and finds the parameters. These parameters are saved in 'coefs.obj' and are used by 'run.py'. If they do not exist 'run.py' will run 'blend.py'. 4) The files in 'data/methods/' were generated with the following procedure: - split the matrix into 5 couples of submatrices train and test, with 90% and 10% of the data. The script to do this is create_train_test.py. - for each of the 5 couples, run all the 8 methods on the train, generating submissions for the corresponding test. This takes a very long time for some of the methods, so we have provided the files already generated. Here is an guide through the files contained in this folder. - ‘run.py’ creates the final submission. Needs the precomputed submissions with the different methods and the precomputed weights. - ‘train_ALS.py’ factorizes a matrix with Alternating Least Squares. Needs the original data and creates the files ‘item_features_ALS.py’ and ‘user_features_ALS.py’. - ‘train_SGD.py’ factorizes a matrix with Stochastic Gradient Descent. Needs the original data and creates the files ‘item_features_SGD.py’ and ‘user_features_SGD.py’. - ‘run_ALS.py’ creates predictions for the whole matrix, using its factorization. - ‘run_SGD.py’ creates predictions for the whole matrix, using its factorization. - ‘run_AVG.py’ creates predictions for the whole matrix, with a modified average method (see report). - ‘run_GLBAVG’ creates predictions for the whole matrix, with the global average. - ‘run_USRAVG’ creates predictions for the whole matrix, with the user average. - ‘run_MVIAVG’ creates predictions for the whole matrix, with the item average. - ‘run_CFI.py’ creates predictions for the whole matrix, with item-based collaborative filtering. - ‘run_CFU.py’ creates predictions for the whole matrix, with user-based collaborative filtering. - ‘blend.py’ computes the weights for the blending with the precomputed values in the folder methods. - ‘create_train_test.py’ creates the 5 couples train-test to run the very expensive training to generate the files in methods folder. - ‘collaborative.py’ contains the function used to compute similarity matrices. - ‘helpers.py’ contains various functions used in the scripts concerning averages, collaborative filtering and blending. - ‘helpers_MF.py’ contains various functions used for matrix factorization. - folder ‘data’ contains precomputed files used in the previously explained algorithms, as well as the ‘submissions’ folder (which will be filled with all the submissions’) and the ‘methods’ folder. These data will be contained in a subfolder of data named ‘train_test’. with the very-long-to-compute matrices used to find the best weights for the different methods.
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published