Skip to content

dbresson/spark-standalone

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

#Spark 1.6.1 (Standalone)

##Start a standalone cluster

###Master

docker run --name spark-master -d --net=host --restart=always skrityak/spark-standalone master

Try to check if the master is running by going to WEBUI (port 8080) on the running machine. We need the url of spark master to start the workers.

###Worker1

Get the url for the master (e.g. spark://192.168.1.50:7077) then start a worker with default port (7078 and 8081 for WebUI)

docker run --name spark-worker1 -d --net=host --restart=always skrityak/spark-standalone worker spark://${MASTER_HOST_OR_IP}:7077

###Worker2

Start anohter worker with different ports by setting environment variables.

docker run --name spark-worker2 -d --net=host --restart=always -e SPARK_WORKER_PORT=7079 -e SPARK_WORKER_WEBUI_PORT=8082 skrityak/spark-standalone worker spark://${MASTER_HOST_OR_IP}:7077

###Running an SparkPi example

docker exec -it spark-master /opt/spark/bin/run-example SparkPi 10

You should be able to see lot of logs with "Pi is roughly 3.142448".

###Submit an SparkPi example from any node

docker exec -it spark-worker /opt/spark/bin/spark-submit --master spark://${MASTER_HOST_OR_IP}:7077 /opt/spark/examples/src/main/python/pi.py 10

##Environment Variables

Spark reads environment variables in start script so we can adjust the variables to change ip/ports. Please see http://spark.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts for the available variables.

Basically, this docker image set default values to the following variables:-

SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8080
SPARK_WORKER_PORT=7078
SPARK_WORKER_WEBUI_PORT=8081

About

Spark Standalone Dockerfile

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 100.0%