Skip to content
This repository has been archived by the owner on Sep 22, 2022. It is now read-only.

Latest commit

 

History

History
36 lines (29 loc) · 1.24 KB

README.md

File metadata and controls

36 lines (29 loc) · 1.24 KB

About

Practice Course on Big Data

Spark Configuration

Use the following comand to start interactive Jupyter PySpark session (set Python v.3.6 as the default version):

PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_PYTHON=python3.6 PYSPARK_DRIVER_PYTHON_OPTS='notebook --ip=0.0.0.0 --port=port_1' pyspark --conf spark.ui.port=port_2 --driver-memory 512m --master yarn --num-executors 2 --executor-cores 1

Add the following rule to ssh forwarding:

-L port_1:localhost:port_1 

Open the following URL in you favourite browser:

Spark Streaming and Kafka will require to add extra flags:

--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0

see more details at:

Spark Cassandra will require two following flags:

--packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.2
--conf spark.cassandra.connection.host=brain-node1

Useful Spark documentation links: