Skip to content

Files

Latest commit

9a5f6ad · Jul 15, 2022

History

History
11 lines (9 loc) · 929 Bytes

File metadata and controls

11 lines (9 loc) · 929 Bytes

Retail-Sales-Analytics-Using-Apache-Spark

The project deals with analyzing Retail-Dataset using Apache Spark. Apache Spark enables large and big data analyses. It does this by parallel processing using different threads and cores optimally. It can therefore improve performance on a cluster but also on a single machine. Analysis is implemented using pyspark

PYSPARK

PySpark has been released in order to support the collaboration of Apache Spark and Python, it actually is a Python API for Spark. In addition, PySpark, helps you interface with Resilient Distributed Datasets (RDDs) in Apache Spark and Python programming language. Along with writing Spark applications using Python APIs, it also provides the PySpark shell for interactively analyzing your data in a distributed environment. PySpark supports most of Spark’s features such as Spark SQL, DataFrame, Streaming, MLlib (Machine Learning) and Spark Core.