Skip to content

Final Project for Dezyre class -- contains Hive, Pig, and MapReduce scripts processing expense data from 2011 Medicare database.

Notifications You must be signed in to change notification settings

rchen314/HadoopFinalAssignmentHealth

Repository files navigation

HadoopFinalAssignmentHealth2

Final Project for Dezyre class -- contains Hive, Pig, and MapReduce scripts processing data from 2011 Medicare database.

List of files:

inpatient_small.csv                    Sample of dataset used

Robert Chen Final Explanation.pdf      More detailed explanation of project

mapreduce/

   mapreduce_notes.txt                 Explanation/notes on the mapreduce portion

   InPatient.java                      Code for finding average cost per procedure

   InPatientState.java                 Code for finding average cost per state

   output_by_procedure.txt             Output when running InPatient.java

   output_by_state.txt                 Output when running InPatientState.java

   InPatient.tar                       Full Eclipse workspace to run InPatient

   InPatientState.tar                  Full Eclipse workspace to run InPatientState
  
   opencsv-2.2.jar                     Custom jar file for processing CSV (see mapreduce_notes.txt)
  
   sortProcedure.py                    Python script to sort "output_by_procedure.txt" by cost
  
   sortState.py                        Python script to sort "output_by_state.txt" by cost
  
   output_by_procedure.txt.sorted      Sorted output -- from most expensive procedure to least
  
   output_by_state.txt.sorted          Sorted output -- from most expensive state to least

pig/

   pig_final_assignment.pig            Commands in pig used to run queries and the results

hive/

   hive_final_assignment.q             Commands in hive used to run queries and the results

   csv_serde-0.9.1.jar                 Custom serde used for processing CSV (see .q file for detals on use)

About

Final Project for Dezyre class -- contains Hive, Pig, and MapReduce scripts processing expense data from 2011 Medicare database.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published