Skip to content

fferegrino/MrPageRank

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PageRank calculation - Big Data

This repository hosts the code for both assessed exercises corresponding to the Big Data course at the University of Glasgow.

(you can watch a video I made about the course here: https://youtu.be/bQ-2mZoWLGE)

General task

The task was to implement a watered-down version of the PageRank algorithm over a parsed version of the complete Wikipedia edit history as of January 2008.

To achieve such task we were required to design and implement algorithms for parsing, filtering, projecting, and transforming data.

Excercise 1 (MapReduce)

Solution

The solution and its explanation can be found in the folder wiki.

Exercise 2 (Spark)

The task for this exercise was the same as the previous one, but this time the program needed to offer the option of computing PageRank scores at a user-defined point in time.

Solution

The solution and its explanation can be found in the folder wiki-spark/scala.