Introduction

Analysis of a publicly available Toronto Parking Tickets dataset using Apache Spark and Scala programming. The project produces a JAR file that can be submitted to Apache Spark in Standalone or Cluster mode for example Google Dataproc or Amazon EMR. The Resilient Distributed Datasets (RDD), Map/Reduce, and ETL are some of the concepts used widely in this application.