Skip to content

nonflame/big-data-course

 
 

Repository files navigation

About

Practice Course on Big Data

Spark Configuration

Use the following comand to start interactive Jupyter PySpark session (set Python v.3.6 as the default version):

PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_PYTHON=python3.6 PYSPARK_DRIVER_PYTHON_OPTS='notebook --ip=0.0.0.0 --port=port_1' pyspark --conf spark.ui.port=port_2 --driver-memory 512m --master yarn --num-executors 2 --executor-cores 1

Add the following rule to ssh forwarding:

-L port_1:localhost:port_1 

Open the following URL in you favourite browser:

Spark Streaming and Kafka will require to add extra flags:

--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0

see more details at:

Spark Cassandra will require two following flags:

--packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.2
--conf spark.cassandra.connection.host=brain-node1

Useful Spark documentation links:

About

Practice course on Big Data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 74.1%
  • Shell 18.5%
  • Python 6.9%
  • HiveQL 0.5%