Dockerized Apache Spark + Cassandra
This allows running Apache Spark Standalone co-resident with Apache Cassandra Distributed Database.
The image is based on https://hub.docker.com/_/cassandra/ with some additions to get Apache Spark running.
Ideally you would run it using docker compose ( see yml file provided). This will ensure that you start master and a worker node on the same network so they can see each other.
docker-compose --project-name spark-cassandra build
docker-compose --project-name spark-cassandra up -d
With docker-compose adding mode worker nodes is as easy as:
docker-compose --project-name spark-cassandra scale spark-cassandra-node=3
Following commands should do the trick assuming that you have a docker network called sparkcassandra_default created
docker run -d --rm --network sparkcassandra_default -e "SPARK_MODE=master" -e "CASSANDRA_SEEDS=spark-cassandra-master" --name spark-master dawidnowak/spark-cassandra:2.1
docker run -d --rm --network sparkcassandra_default -e "CASSANDRA_SEEDS=spark-cassandra-master" --name spark-node1 dawidnowak/spark-cassandra:2.1
Execute nodetool to check the status of your Apache Cassandra cluster:
docker exec -ti spark-cassandra-master nodetool status
Spark Web Console can be accessed at :
firefox http://<<spark-master-cassandra>>:8080