-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23257][K8S] Kerberos Support for Spark on K8S #21669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
73f2777
6069be5
000120f
13b3adc
0939738
347536e
c30ad8c
1697e74
4a000d2
719b059
fb9e810
e7935f8
aa3779c
32c408c
3cf644e
583a52c
6ae3def
78953e6
73f157f
367e65b
5f52a1a
6548ef9
7f72af5
4ce00a5
89063fd
e303048
69840a8
2108154
a987a70
e2f8063
f3a0ffb
a958920
dd95fca
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -722,7 +722,82 @@ with encryption, at least. | |
| The Kerberos login will be periodically renewed using the provided credentials, and new delegation | ||
| tokens for supported will be created. | ||
|
|
||
| ## Secure Interaction with Kubernetes | ||
|
|
||
| When talking to Hadoop-based services behind Kerberos, it was noted that Spark needs to obtain delegation tokens | ||
| so that non-local processes can authenticate. These delegation tokens in Kubernetes are stored in Secrets that are | ||
| shared by the Driver and its Executors. As such, there are three ways of submitting a Kerberos job: | ||
|
|
||
| In all cases you must define the environment variable: `HADOOP_CONF_DIR` or | ||
| `spark.kubernetes.hadoop.configMapName.` | ||
|
|
||
| It also important to note that the KDC needs to be visible from inside the containers. | ||
|
|
||
| If a user wishes to use a remote HADOOP_CONF directory, that contains the Hadoop configuration files, this could be | ||
| achieved by setting `spark.kubernetes.hadoop.configMapName` to a pre-existing ConfigMap. | ||
|
|
||
ifilonenko marked this conversation as resolved.
Show resolved
Hide resolved
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This paragraph is a great addition to the docs |
||
| 1. Submitting with a $kinit that stores a TGT in the Local Ticket Cache: | ||
| ```bash | ||
| /usr/bin/kinit -kt <keytab_file> <username>/<krb5 realm> | ||
| /opt/spark/bin/spark-submit \ | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of using a command line example with some configs that are really unrelated to the feature being explained, how about only explaining the configs that need to be set to enable the feature? Preferably using a table like other config-related documents use.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Personally I think having an example in the docs as well (that being said table + config documents is also good to have and the example shouldn't replace that need). |
||
| --deploy-mode cluster \ | ||
| --class org.apache.spark.examples.HdfsTest \ | ||
| --master k8s://<KUBERNETES_MASTER_ENDPOINT> \ | ||
| --conf spark.executor.instances=1 \ | ||
| --conf spark.app.name=spark-hdfs \ | ||
| --conf spark.kubernetes.container.image=spark:latest \ | ||
| --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \ | ||
| local:///opt/spark/examples/jars/spark-examples_<VERSION>.jar \ | ||
| <HDFS_FILE_LOCATION> | ||
| ``` | ||
| 2. Submitting with a local Keytab and Principal | ||
| ```bash | ||
| /opt/spark/bin/spark-submit \ | ||
| --deploy-mode cluster \ | ||
| --class org.apache.spark.examples.HdfsTest \ | ||
| --master k8s://<KUBERNETES_MASTER_ENDPOINT> \ | ||
| --conf spark.executor.instances=1 \ | ||
| --conf spark.app.name=spark-hdfs \ | ||
| --conf spark.kubernetes.container.image=spark:latest \ | ||
| --conf spark.kerberos.keytab=<KEYTAB_FILE> \ | ||
| --conf spark.kerberos.principal=<PRINCIPAL> \ | ||
| --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \ | ||
| local:///opt/spark/examples/jars/spark-examples_<VERSION>.jar \ | ||
| <HDFS_FILE_LOCATION> | ||
| ``` | ||
|
|
||
| 3. Submitting with pre-populated secrets, that contain the Delegation Token, already existing within the namespace | ||
| ```bash | ||
| /opt/spark/bin/spark-submit \ | ||
| --deploy-mode cluster \ | ||
| --class org.apache.spark.examples.HdfsTest \ | ||
| --master k8s://<KUBERNETES_MASTER_ENDPOINT> \ | ||
| --conf spark.executor.instances=1 \ | ||
| --conf spark.app.name=spark-hdfs \ | ||
| --conf spark.kubernetes.container.image=spark:latest \ | ||
| --conf spark.kubernetes.kerberos.tokenSecret.name=<SECRET_TOKEN_NAME> \ | ||
| --conf spark.kubernetes.kerberos.tokenSecret.itemKey=<SECRET_ITEM_KEY> \ | ||
| --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \ | ||
| local:///opt/spark/examples/jars/spark-examples_<VERSION>.jar \ | ||
| <HDFS_FILE_LOCATION> | ||
| ``` | ||
|
|
||
| 3b. Submitting like in (3) however specifying a pre-created krb5 ConfigMap and pre-created `HADOOP_CONF_DIR` ConfigMap | ||
| ```bash | ||
| /opt/spark/bin/spark-submit \ | ||
| --deploy-mode cluster \ | ||
| --class org.apache.spark.examples.HdfsTest \ | ||
| --master k8s://<KUBERNETES_MASTER_ENDPOINT> \ | ||
| --conf spark.executor.instances=1 \ | ||
| --conf spark.app.name=spark-hdfs \ | ||
| --conf spark.kubernetes.container.image=spark:latest \ | ||
| --conf spark.kubernetes.kerberos.tokenSecret.name=<SECRET_TOKEN_NAME> \ | ||
| --conf spark.kubernetes.kerberos.tokenSecret.itemKey=<SECRET_ITEM_KEY> \ | ||
| --conf spark.kubernetes.hadoop.configMapName=<HCONF_CONFIG_MAP_NAME> \ | ||
| --conf spark.kubernetes.kerberos.krb5.configMapName=<KRB_CONFIG_MAP_NAME> \ | ||
| local:///opt/spark/examples/jars/spark-examples_<VERSION>.jar \ | ||
| <HDFS_FILE_LOCATION> | ||
| ``` | ||
| # Event Logging | ||
|
|
||
| If your applications are using event logging, the directory where the event logs go | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -60,11 +60,13 @@ private[spark] object Constants { | |
| val ENV_CLASSPATH = "SPARK_CLASSPATH" | ||
| val ENV_DRIVER_BIND_ADDRESS = "SPARK_DRIVER_BIND_ADDRESS" | ||
| val ENV_SPARK_CONF_DIR = "SPARK_CONF_DIR" | ||
| val ENV_SPARK_USER = "SPARK_USER" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess this is for setting the correct user. But I think hadoop libs should pick the correct user like in SparkContext where |
||
| // Spark app configs for containers | ||
| val SPARK_CONF_VOLUME = "spark-conf-volume" | ||
| val SPARK_CONF_DIR_INTERNAL = "/opt/spark/conf" | ||
| val SPARK_CONF_FILE_NAME = "spark.properties" | ||
| val SPARK_CONF_PATH = s"$SPARK_CONF_DIR_INTERNAL/$SPARK_CONF_FILE_NAME" | ||
| val ENV_HADOOP_TOKEN_FILE_LOCATION = "HADOOP_TOKEN_FILE_LOCATION" | ||
|
|
||
| // BINDINGS | ||
| val ENV_PYSPARK_PRIMARY = "PYSPARK_PRIMARY" | ||
|
|
@@ -78,4 +80,29 @@ private[spark] object Constants { | |
| val KUBERNETES_MASTER_INTERNAL_URL = "https://kubernetes.default.svc" | ||
| val DRIVER_CONTAINER_NAME = "spark-kubernetes-driver" | ||
| val MEMORY_OVERHEAD_MIN_MIB = 384L | ||
|
|
||
| // Hadoop Configuration | ||
| val HADOOP_FILE_VOLUME = "hadoop-properties" | ||
| val KRB_FILE_VOLUME = "krb5-file" | ||
| val HADOOP_CONF_DIR_PATH = "/opt/hadoop/conf" | ||
| val KRB_FILE_DIR_PATH = "/etc" | ||
| val ENV_HADOOP_CONF_DIR = "HADOOP_CONF_DIR" | ||
| val HADOOP_CONFIG_MAP_NAME = | ||
| "spark.kubernetes.executor.hadoopConfigMapName" | ||
| val KRB5_CONFIG_MAP_NAME = | ||
| "spark.kubernetes.executor.krb5ConfigMapName" | ||
|
|
||
| // Kerberos Configuration | ||
| val KERBEROS_DELEGEGATION_TOKEN_SECRET_NAME = "delegation-tokens" | ||
| val KERBEROS_DT_SECRET_NAME = | ||
| "spark.kubernetes.kerberos.dt-secret-name" | ||
| val KERBEROS_DT_SECRET_KEY = | ||
| "spark.kubernetes.kerberos.dt-secret-key" | ||
| val KERBEROS_SPARK_USER_NAME = | ||
| "spark.kubernetes.kerberos.spark-user-name" | ||
| val KERBEROS_SECRET_KEY = "hadoop-tokens" | ||
|
|
||
| // Hadoop credentials secrets for the Spark app. | ||
| val SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR = "/mnt/secrets/hadoop-credentials" | ||
| val SPARK_APP_HADOOP_SECRET_VOLUME_NAME = "hadoop-secret" | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check has been restrictive for customers in the past. There are cases where spark submit should not have the file locally and keytab should be mounted as a secret within the cluster for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check can be removed, but I included it since I believed that the keytab shouldn't be stored as a secret for security reasons and should instead be only accessible from the JVM.