Codes for HKUST COMP4651 Fall 2018: Cloud Computing and Big Data Systems
Assignment 1: Benchmarking and measuring AWS EC2 CPU, Memory, and Network Performance across different types of instances and cluster locations
Assignment 2: Java implementation on copying files between HDFS and locals while maintaining the checksum
Assignment 3: MapReduce Programming on Java for Bigram count and frequency calculation based on Stripes and Pairs design pattern
Assignment 4: Apache Spark
Assignmnet 5: Power Plant Machine Learning Pipeline Application with Apache Spark
DataFrame Live Programming: Spark's DataFrame Live Programming hands-on tutorial from Spark SF Meetup 2016
Spark Tutorial: Apache Spark tutorial heavily adapted from Spark MOOC
EMR Test: Test for Amazon EMR and S3 instances