Skip to content

vaibhavoberoi/big_data_analysis_hadoop_stack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

big_data_analysis_hadoop_stack

Full fledged data analysis project using Hadoop stack

Steps performed in the project:

  1. Acquire the top 200,000 posts by viewcount
  2. Using Pig or MapReduce , extract, transform and load the data as applicable
  3. Using Hive Query Language , compute: I. The top 10 posts by score II. The top 10 users by post score III. The number of distinct users, who used the word “Hadoop” in one of their posts
  4. Using Mapreduce calculate the per user TF IDF and find 10 most used words, excluding stop words.

Refer to "Documentation" for step by step guide.

About

Full fledged data analysis project using Hadoop stack

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages