Retail-Sales-Analytics-Using-Apache-Spark

The project deals with analyzing Retail-Dataset using Apache Spark. Apache Spark enables large and big data analyses. It does this by parallel processing using different threads and cores optimally. It can therefore improve performance on a cluster but also on a single machine. Analysis is implemented using pyspark

PYSPARK

PySpark has been released in order to support the collaboration of Apache Spark and Python, it actually is a Python API for Spark. In addition, PySpark, helps you interface with Resilient Distributed Datasets (RDDs) in Apache Spark and Python programming language. Along with writing Spark applications using Python APIs, it also provides the PySpark shell for interactively analyzing your data in a distributed environment. PySpark supports most of Spark’s features such as Spark SQL, DataFrame, Streaming, MLlib (Machine Learning) and Spark Core.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Files

README.md

README.md

Retail-Sales-Analytics-Using-Apache-Spark

PYSPARK

Collapse file tree

Files

README.md

Latest commit

History

README.md

File metadata and controls

Retail-Sales-Analytics-Using-Apache-Spark

PYSPARK