ml-latest-small: MovieLens latest small dataset from here.
Spark
environmentpyspark
numpy
pandas
matplotlib
seaborn
We use Spark APIs to implement Matrix Factorizationan and Alternating Least Squares (ALS) algorithm and predict the ratings for the movies in MovieLens small dataset. Data exploration and a dataframe-based approach are shown in Spark_MovieLens.ipynb. An RDD-based approach is shown in Spark_MovieLens_RDD.ipynb