cu2rec is a Matrix Factorization library designed to accelerate training Recommender Systems models using GPUs in CUDA. It implements Parallel Stochastic Gradient Descent for training the matrix factorization model.
The input data should be a CSV file in the form of userId,itemId,rating
and should have an header. If the user ids and the item ids are not sequential, run python preprocessing/map_items.py <ratings_file>
to convert the user ids and item ids into sequential integers, starting with 1.
Once you have a mapped CSV, you can use python preprocessing/split_to_test_train.py <mapped_file> <test_ratio>
to split the data into training and tests sets to use with mf.cu
.
Alternatively, you can also use the datasets below:
- Download movielens data here and save in
data
folder. - Run
python preprocessing/map_items.py <ratings_file>
to create a user-item mapped ratings file. - Run
python preprocessing/split_to_test_train.py <mapped_file> <test_ratio>
to split it into training and test files.
- Download the Netflix dataset here and place in under
data/datasets/netflix
. - Run
python preprocessing/map_netflix.py
to create the mapped training and test files.
- SSH into Prince or
cuda2
using NYU credentials srun -t5:00:00 --mem=30000 --gres=gpu:1 --pty /bin/bash
module load cuda/9.2.88
cd matrix_factorization && make
The makefile compiles for compute capability 5.2. If you have a GPU that does not support that, please change it to compile for your device's compute capability. The code has been tested for compute capability down to 3.5.
make mf
bin/mf -c <config_file> <ratings_file_train> <ratings_file_test>
In order to run all of the experiments mentioned in the report, you can cd experiments
and run the included bash scripts. cu2rec.sh
will give you the total runtimes and error metrics for all configurations, while cu2rec_prof.sh
will give you all the nvprof
results. Make sure you have all the data as described in the data section.
- Make sure you get the user data into the same ratings format as MovieLens.
make predict
bin/predict -c <config_file> -i <trained_item_bias_file> -g <trained_global_bias_file> -q <trained_Q_file> <ratings_file>
cd tests
make
- If you want to run all tests,
make run_all
- Otherwise,
bin/test_{}