Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Log Aggregation for Spark #218

Closed
Tracked by #288
maltesander opened this issue Feb 27, 2023 · 0 comments
Closed
Tracked by #288

Enable Log Aggregation for Spark #218

maltesander opened this issue Feb 27, 2023 · 0 comments
Assignees
Labels
release/2023-04 release-note Denotes a PR that will be considered when it comes time to generate release notes. type/feature-new

Comments

@maltesander
Copy link
Member

maltesander commented Feb 27, 2023

See stackabletech/hbase-operator#291 and stackabletech/docker-images#283 for reference.

This is part of stackabletech/issues#288

Implementation details

Additional Java options are added to the Spark submit job, the Spark driver, the Spark executor, and the Spark history server, e.g. to set the log4j configuration file or the classpath.

Spark submit job

The Java options of the Spark submit job are set with the environment variable SPARK_SUBMIT_OPTS. Using this environment variable is the only way in cluster mode. Unfortunately it is not part of the public API but the logging integration test will break if this will ever be changed. The user cannot override this environment variable in the cluster specification.

Spark driver

The Java options of the Spark driver are set with the configuration spark.driver.defaultJavaOptions. The user can add extra Java options with spark.driver.extraJavaOptions in the cluster specification at spec.sparkConf. The Spark configuration contains both options and is located in the file spark.properties in a generated ConfigMap named spark-drv-<hash>-conf-map. This configuration file is then used in the driver:

$ kubectl logs spark-cluster-<hash>-driver -c spark
...
+ exec /usr/bin/tini -s -- /stackable/spark/bin/spark-submit --conf spark.driver.bindAddress=... --deploy-mode client --properties-file /opt/spark/conf/spark.properties ...

Spark executor

The Java options of the Spark executor are set with the configuration spark.executor.defaultJavaOptions. The user can add extra Java options with spark.executor.extraJavaOptions in the cluster specification at spec.sparkConf. Both options are added to the command line of the executor:

$ kubectl logs spark<job>-<hash>-exec-1 -c spark
...
+ exec /usr/bin/tini -s -- /usr/lib/jvm/jre-11/bin/java ... -Dlog4j.configurationFile=/stackable/log_config/log4j2.properties <extraJavaOptions> ...

Spark history server

The Java options of the Spark history server are set with the environment variable SPARK_HISTORY_OPTS. This environment variable cannot be overriden by the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release/2023-04 release-note Denotes a PR that will be considered when it comes time to generate release notes. type/feature-new
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants