Big Data - Map Reduce and Spark
Implemented following different tasks:
-
WordCount and SetDifference using Map Reduce Command to run : python mapReduce.py
-
WordCount and SetDifference using Spark Command to run : python spark_wordCount_setDifference.py
-
Find frequency for each industry words in the blog authorship corpus Command to run : python spark-industryNameFrequency.py [Directory path for corpus]