Monitoring Apache Spark and HDFS on Docker with Prometheus and Grafana

Goal

The goal of this project is to:

Create a Docker Container that runs Spark on top of HDFS
Use Prometheus to get metrics from Spark applications and Node-exporter
Use Grafana to display the metrics collected

Configuration

Hadoop Configurations for core-sites.xml and hadoop-env.sh are set here.
Spark Configurations for spark-env.sh and spark-defaults.conf are set here.
Environment variables for Spark/Hadoop versions and library paths are set here.

Notes

Spark version running is 3.0.1, and HDFS version is 3.2.0.
For all available metrics for Spark monitoring see here.
The containerized environment consists of a Master, a Worker, a DataNode, a NameNode and a SecondaryNameNode.
To track metrics across Spark apps, appName needs to be set up or else the spark.metrics.namespace will be spark.app.id that changes after every invocation of the app.
Main Python Application running is app.py that is an example application computing number pi. For your own application/use of HDFS please do changes accordingly.
Dockerfile for Spark/Hadoop is also available here in order to add it in docker-compose.yaml file as seen here.

Usage

Assuming that Docker is installed, simply execute the following command to build and run the Docker Containers:

docker-compose build && docker-compose up

Screenshots

Example dashboard for Spark Metrics:

All available services from Service Discovery in Prometheus:

Troobleshooting

Please file issues if you run into any problems.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Grafana		Grafana
Prometheus		Prometheus
Spark		Spark
screenshot		screenshot
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Monitoring Apache Spark and HDFS on Docker with Prometheus and Grafana

Goal

Configuration

Notes

Usage

Screenshots

Troobleshooting

About

Releases

Packages

Languages

License

nikoshet/monitoring-spark-on-docker

Folders and files

Latest commit

History

Repository files navigation

Monitoring Apache Spark and HDFS on Docker with Prometheus and Grafana

Goal

Configuration

Notes

Usage

Screenshots

Troobleshooting

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages