Enable Log Aggregation for Spark #218

maltesander · 2023-02-27T08:55:29Z

See stackabletech/hbase-operator#291 and stackabletech/docker-images#283 for reference.

This is part of stackabletech/issues#288

Implementation details

Additional Java options are added to the Spark submit job, the Spark driver, the Spark executor, and the Spark history server, e.g. to set the log4j configuration file or the classpath.

Spark submit job

The Java options of the Spark submit job are set with the environment variable SPARK_SUBMIT_OPTS. Using this environment variable is the only way in cluster mode. Unfortunately it is not part of the public API but the logging integration test will break if this will ever be changed. The user cannot override this environment variable in the cluster specification.

Spark driver

The Java options of the Spark driver are set with the configuration spark.driver.defaultJavaOptions. The user can add extra Java options with spark.driver.extraJavaOptions in the cluster specification at spec.sparkConf. The Spark configuration contains both options and is located in the file spark.properties in a generated ConfigMap named spark-drv-<hash>-conf-map. This configuration file is then used in the driver:

$ kubectl logs spark-cluster-<hash>-driver -c spark
...
+ exec /usr/bin/tini -s -- /stackable/spark/bin/spark-submit --conf spark.driver.bindAddress=... --deploy-mode client --properties-file /opt/spark/conf/spark.properties ...

Spark executor

The Java options of the Spark executor are set with the configuration spark.executor.defaultJavaOptions. The user can add extra Java options with spark.executor.extraJavaOptions in the cluster specification at spec.sparkConf. Both options are added to the command line of the executor:

$ kubectl logs spark<job>-<hash>-exec-1 -c spark
...
+ exec /usr/bin/tini -s -- /usr/lib/jvm/jre-11/bin/java ... -Dlog4j.configurationFile=/stackable/log_config/log4j2.properties <extraJavaOptions> ...

Spark history server

The Java options of the Spark history server are set with the environment variable SPARK_HISTORY_OPTS. This environment variable cannot be overriden by the user.

The text was updated successfully, but these errors were encountered:

maltesander added release-note Denotes a PR that will be considered when it comes time to generate release notes. type/feature-new release/2023-04 labels Feb 27, 2023

This was referenced Feb 27, 2023

Log Aggregation stackabletech/issues#288

Closed

Configure pod logging #24

Closed

siegfriedweber self-assigned this Mar 2, 2023

This was referenced Mar 9, 2023

Add Vector and dependencies required for logging to the Spark-K8S images stackabletech/docker-images#342

Merged

Implement the Atomic trait for VolumeMount stackabletech/operator-rs#566

Closed

This was referenced Mar 30, 2023

[Merged by Bors] - Merging and validation of the configuration refactored #223

Closed

[Merged by Bors] - Logging #226

Closed

siegfriedweber closed this as completed Apr 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Log Aggregation for Spark #218

Enable Log Aggregation for Spark #218

maltesander commented Feb 27, 2023 •

edited by siegfriedweber

Loading

Enable Log Aggregation for Spark #218

Enable Log Aggregation for Spark #218

Comments

maltesander commented Feb 27, 2023 • edited by siegfriedweber Loading

Implementation details

Spark submit job

Spark driver

Spark executor

Spark history server

maltesander commented Feb 27, 2023 •

edited by siegfriedweber

Loading