Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 550 Bytes

README.md

File metadata and controls

7 lines (4 loc) · 550 Bytes

Introduction

Analysis of a publicly available Toronto Parking Tickets dataset using Apache Spark and Scala programming. The project produces a JAR file that can be submitted to Apache Spark in Standalone or Cluster mode for example Google Dataproc or Amazon EMR. The Resilient Distributed Datasets (RDD), Map/Reduce, and ETL are some of the concepts used widely in this application.

Dataset Source

The entire dataset is available at this link