big_data_analysis_hadoop_stack

Full fledged data analysis project using Hadoop stack

Steps performed in the project:

Acquire the top 200,000 posts by viewcount
Using Pig or MapReduce , extract, transform and load the data as applicable
Using Hive Query Language , compute: I. The top 10 posts by score II. The top 10 users by post score III. The number of distinct users, who used the word “Hadoop” in one of their posts
Using Mapreduce calculate the per user TF IDF and find 10 most used words, excluding stop words.

Refer to "Documentation" for step by step guide.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Commands_Codes		Commands_Codes
Mappers_Reducers		Mappers_Reducers
Documentation.pdf		Documentation.pdf
LICENSE		LICENSE
README.md		README.md
tfidf_result.csv		tfidf_result.csv

Provide feedback