Course Info: University of Berkeley: Introduction to Big Data with Apache Spark
University website: Piazza discussion group:
Course Content: Week 1: Big Data and Data Science - Introduction to Big Data and Data Science - Performing Data Science and Preparing Data - Setting up the Course Software Environment
Week 2: Introduction to Apache Spark
- Big Data, Hardware Trends, and the History of Apache Spark
- Spark Essentials
- Lab 1: Learning Apache Spark
Week 3: Data Management
- Semi-Structured Data
- Structured Data
- Lab 2: Web Server Log Analysis with Apache Spark
Week 4: Data Quality, Exploratory Data Analysis, and Machine Learning
- Data Quality
- Exploratory Data Analysis
- Machine Learning - Spark's machine learning library, mllib
- Lab 3: Text Analysis and Entity Resolution
Week 5: Data Management
- Lab 4: Introduction to Machine Learning with Apache Spark