-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25750][K8S][TESTS] Kerberos Support Integration Tests #22608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@mccheah @liyinan926 @erikerlandson for review Things to note:
|
|
Test build #96845 has finished for PR 22608 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #96854 has finished for PR 22608 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
@ifilonenko can we work with the existing service-account-name config parameters for obtaining the resource permissions? |
|
re: hadoop-2.7.3.tgz is that something Shane needs to install on the testing infra, to build the images you want? |
|
Although this is a large patch, its impact on existing code is small, and it is nearly all testing code. Unless the tests themselves are unstable, I'd consider this plausible to include with the 2.4 release. |
|
@erikerlandson the clusterrolebinding is something the user who is testing should set up. As such, we may disregard that bullet-point from the conversation. However, I am wondering what are thoughts of calling an external docker-image like:
Very true, this feature is very isolated and was designed to be extremely stable (via the WatcherCaches), but should only be merged with #21669. Would like a review on the design so that we may merge this in ASAP when the above PR is merged as they are completely isolated. |
for now it's probably ok, but is there a solution before the next release? |
This integration-test suite works seemlessly and is quite robust when rebased on-top of the Kerberos PR. So if we leave this PR as is, it should be good for merge. Pulling from |
|
Test build #97416 has finished for PR 22608 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #98053 has finished for PR 22608 at commit
|
|
Test build #98055 has finished for PR 22608 at commit
|
|
Just noticed this, but could you open a separate bug for adding these tests, instead of re-using the one where the main code was added? It's a large enough thing that it should be a separate thing. |
I had https://issues.apache.org/jira/browse/SPARK-25750 and linked this PR to that JIRA issue. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #98109 has finished for PR 22608 at commit
|
| # the examples directory is cleaned up before generating the distribution tarball, so this | ||
| # issue does not occur. | ||
| IMG_PATH=resource-managers/kubernetes/docker/src/main/dockerfiles | ||
| IMG_PATH=resource-managers/kubernetes/docker/src |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still need changes to this file given you have moved the test stuffs out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dockerfiles and files for building the kerberos/ hadoop docker images are in src/test. It still seemed like a logical place to keep them with the /test tag, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same question. It doesn't seem like you're actually using this script for the new test stuff, nor changing any of the existing calls to it, so do you need any of the changes being made here?
| if [ -d "$SPARK_HOME"/resource-managers/kubernetes/core/target/ ]; then | ||
| mkdir -p "$DISTDIR/kubernetes/" | ||
| cp -a "$SPARK_HOME"/resource-managers/kubernetes/docker/src/main/dockerfiles "$DISTDIR/kubernetes/" | ||
| cp -a "$SPARK_HOME"/resource-managers/kubernetes/docker/src "$DISTDIR/kubernetes/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto. Why is this change still needed?
resource-managers/kubernetes/docker/src/test/scripts/populate-data.sh
Outdated
Show resolved
Hide resolved
resource-managers/kubernetes/integration-tests/kerberos-yml/data-populator-deployment.yml
Outdated
Show resolved
Hide resolved
resource-managers/kubernetes/integration-tests/kerberos-yml/data-populator-service.yml
Outdated
Show resolved
Hide resolved
resource-managers/kubernetes/integration-tests/kerberos-yml/dn1-deployment.yml
Outdated
Show resolved
Hide resolved
resource-managers/kubernetes/integration-tests/kerberos-yml/kerberos-deployment.yml
Outdated
Show resolved
Hide resolved
resource-managers/kubernetes/integration-tests/kerberos-yml/kerberos-test.yml
Outdated
Show resolved
Hide resolved
resource-managers/kubernetes/integration-tests/kerberos-yml/nn-deployment.yml
Outdated
Show resolved
Hide resolved
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
| restartPolicy: Always | ||
| volumes: | ||
| - name: kerb-keytab | ||
| persistentVolumeClaim: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a StatefulSet, you don't need to explicitly manage PVCs. You can use .spec.persistentVolumeClaimTemplate. The StatefulSet controller automatically creates the PV (or binds to the existing one it created before).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
|
||
| /** | ||
| * This class is responsible for ensuring that the persistent volume claims are bounded | ||
| * to the correct persistent volume and that they are both created before launching the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With StatefulSets, you probably don't need this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
Test build #98281 has finished for PR 22608 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seem to be running different pods for KDC, NN and DN. Is there an advantage to that?
Seems to me you could do the same thing with a single pod and simplify things here.
The it README also mentions "3 CPUs and 4G of memory". Is that still enough with these new things that are run?
| mkdir -p "$DISTDIR/kubernetes/" | ||
| cp -a "$SPARK_HOME"/resource-managers/kubernetes/docker/src/main/dockerfiles "$DISTDIR/kubernetes/" | ||
| cp -a "$SPARK_HOME"/resource-managers/kubernetes/docker/src "$DISTDIR/kubernetes/" | ||
| cp -a "$SPARK_HOME"/resource-managers/kubernetes/integration-tests/scripts "$DISTDIR/kubernetes/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is following the existing pattern in the line below; but is there a purpose in packaging these test artifacts with a binary Spark distribution?
Seems to me like they should be left in the source package and that's it.
| hdfs dfs -copyFromLocal /people.txt /user/userone | ||
|
|
||
| hdfs dfs -chmod -R 755 /user/userone | ||
| hdfs dfs -chown -R ifilonenko /user/userone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ifilonenko?
| --conf spark.kubernetes.namespace=${NAMESPACE} \ | ||
| --conf spark.executor.instances=1 \ | ||
| --conf spark.app.name=spark-hdfs \ | ||
| --conf spark.driver.extraClassPath=/opt/spark/hconf/core-site.xml:/opt/spark/hconf/hdfs-site.xml:/opt/spark/hconf/yarn-site.xml:/etc/krb5.conf \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding files to the classpath does not do anything.
$ scala -cp /etc/krb5.conf
scala> getClass().getResource("/krb5.conf")
res0: java.net.URL = null
$ scala -cp /etc
scala> getClass().getResource("/krb5.conf")
res0: java.net.URL = file:/etc/krb5.conf
So this seems not needed. Also because I'd expect spark-submit or the k8s backend code to add the hadoop conf to the driver's classpath somehow.
Think we want different images for each, but that's fine - just run a pod with those three containers in it. |
You don't need to, right? You can have a single image with all the stuff needed. That would also make setting up the test faster (less images to build).
That's mostly me still getting used to names here; to me pod == one container running with some stuff. But in any case, my main concern in this case is resource utilization - it we can keep things slimmer by running less containers, I think that's better. Individually, the NN, DN and the KDC don't need a lot of resources for this particular test to run. |
| <!-- Put site-specific property overrides in this file. --> | ||
|
|
||
| <configuration> | ||
| <!-- must be set for HDFS libraries to obtain delegation tokens --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could put this in hdfs-site.xml and avoid having to deal with this extra file.
|
It depends on how we're getting the Hadoop images. If we're building everything from scratch, we could run everything in one container - though having a container run more than one process simultaneously isn't common. It's more common to have a single container have a single responsibility / process. But you can group multiple containers that have related responsibilities into a single pod, hence we'll use 3 containers in one pod here. If we're pulling Hadoop images from elsewhere - which it sounds like we aren't doing in the Apache ecosystem in general though - then we'd need to build our own separate image for the KDC anyways. Multiple containers in the same pod all share the same resource footprint and limit boundaries. |
|
@ifilonenko any plans to bring this up to date? |
|
@vanzin yeah, will resync branch and resolve comments to bring this feature in |
|
@mccheah it is possible to use multiple processes per container for testing: https://cloud.google.com/solutions/best-practices-for-building-containers (several vendors do). |
## What changes were proposed in this pull request? This is the work on setting up Secure HDFS interaction with Spark-on-K8S. The architecture is discussed in this community-wide google [doc](https://docs.google.com/document/d/1RBnXD9jMDjGonOdKJ2bA1lN4AAV_1RwpU_ewFuCNWKg) This initiative can be broken down into 4 Stages **STAGE 1** - [x] Detecting `HADOOP_CONF_DIR` environmental variable and using Config Maps to store all Hadoop config files locally, while also setting `HADOOP_CONF_DIR` locally in the driver / executors **STAGE 2** - [x] Grabbing `TGT` from `LTC` or using keytabs+principle and creating a `DT` that will be mounted as a secret or using a pre-populated secret **STAGE 3** - [x] Driver **STAGE 4** - [x] Executor ## How was this patch tested? Locally tested on a single-noded, pseudo-distributed Kerberized Hadoop Cluster - [x] E2E Integration tests apache#22608 - [ ] Unit tests ## Docs and Error Handling? - [x] Docs - [x] Error Handling ## Contribution Credit kimoonkim skonto Closes apache#21669 from ifilonenko/secure-hdfs. Lead-authored-by: Ilan Filonenko <if56@cornell.edu> Co-authored-by: Ilan Filonenko <ifilondz@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
|
I'm closing this for now due to inactivity. If the branch is updated the PR will reopen. Or someone else can pick this up. |
This is the work on setting up Secure HDFS interaction with Spark-on-K8S. The architecture is discussed in this community-wide google [doc](https://docs.google.com/document/d/1RBnXD9jMDjGonOdKJ2bA1lN4AAV_1RwpU_ewFuCNWKg) This initiative can be broken down into 4 Stages **STAGE 1** - [x] Detecting `HADOOP_CONF_DIR` environmental variable and using Config Maps to store all Hadoop config files locally, while also setting `HADOOP_CONF_DIR` locally in the driver / executors **STAGE 2** - [x] Grabbing `TGT` from `LTC` or using keytabs+principle and creating a `DT` that will be mounted as a secret or using a pre-populated secret **STAGE 3** - [x] Driver **STAGE 4** - [x] Executor Locally tested on a single-noded, pseudo-distributed Kerberized Hadoop Cluster - [x] E2E Integration tests apache#22608 - [ ] Unit tests - [x] Docs - [x] Error Handling kimoonkim skonto Closes apache#21669 from ifilonenko/secure-hdfs. Lead-authored-by: Ilan Filonenko <if56@cornell.edu> Co-authored-by: Ilan Filonenko <ifilondz@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
What changes were proposed in this pull request?
This fix includes just the integration tests for Kerberos Support
How was this patch tested?
This patch includes a single-noded pseudo-distributed Kerberized Hadoop cluster for the purpose of testing Kerberos interaction. The Keytabs are shared with Persistent Volumes and communication happens all within the same Kubernetes cluster.