[SPARK-25750][K8S][TESTS] Kerberos Support Integration Tests #22608

ifilonenko · 2018-10-02T07:49:18Z

What changes were proposed in this pull request?

This fix includes just the integration tests for Kerberos Support

How was this patch tested?

This patch includes a single-noded pseudo-distributed Kerberized Hadoop cluster for the purpose of testing Kerberos interaction. The Keytabs are shared with Persistent Volumes and communication happens all within the same Kubernetes cluster.

ifilonenko · 2018-10-02T07:51:24Z

@mccheah @liyinan926 @erikerlandson for review

Things to note:

clusterrolebindings might be needed to ensure driver can setup necessary resources.
Any way to include the hadoop-2.7.3.tgz so that the hadoop-base:latest image can be built on the fly as opposed to pulling from ifilonenko/hadoop-base:latest

SparkQA · 2018-10-02T07:53:58Z

Test build #96845 has finished for PR 22608 at commit 54316ba.

This patch fails RAT tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-02T08:02:30Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/3618/

SparkQA · 2018-10-02T08:16:37Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/3618/

SparkQA · 2018-10-02T10:35:13Z

Test build #96854 has finished for PR 22608 at commit 56e2c6e.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-02T10:43:39Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/3627/

SparkQA · 2018-10-02T10:58:55Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/3627/

erikerlandson · 2018-10-03T13:43:38Z

@ifilonenko can we work with the existing service-account-name config parameters for obtaining the resource permissions?

erikerlandson · 2018-10-03T13:45:25Z

re: hadoop-2.7.3.tgz is that something Shane needs to install on the testing infra, to build the images you want?

erikerlandson · 2018-10-06T20:26:13Z

Although this is a large patch, its impact on existing code is small, and it is nearly all testing code. Unless the tests themselves are unstable, I'd consider this plausible to include with the 2.4 release.

ifilonenko · 2018-10-07T00:40:57Z

@erikerlandson the clusterrolebinding is something the user who is testing should set up. As such, we may disregard that bullet-point from the conversation. However, I am wondering what are thoughts of calling an external docker-image like: ifilonenko/hadoop-base:latest for now? This would otherwise require for the hadoop-base image to be built in the docker-image-builder and for the distribution to contain the hadoop-2.7.3.tgz file for the image to build.

Although this is a large patch, its impact on existing code is small, and it is nearly all testing code. Unless the tests themselves are unstable, I'd consider this plausible to include with the 2.4 release.

Very true, this feature is very isolated and was designed to be extremely stable (via the WatcherCaches), but should only be merged with #21669. Would like a review on the design so that we may merge this in ASAP when the above PR is merged as they are completely isolated.

felixcheung · 2018-10-07T01:13:54Z

calling an external docker-image like: ifilonenko/hadoop-base:latest for now

for now it's probably ok, but is there a solution before the next release?

ifilonenko · 2018-10-07T02:31:39Z

for now it's probably ok, but is there a solution before the next release?

This integration-test suite works seemlessly and is quite robust when rebased on-top of the Kerberos PR. So if we leave this PR as is, it should be good for merge. Pulling from ifilonenko/hadoop-base:latest makes it soooo much easier :)

SparkQA · 2018-10-16T00:25:09Z

Test build #97416 has finished for PR 22608 at commit 436f652.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-16T00:33:11Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4006/

SparkQA · 2018-10-25T22:58:43Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4510/

SparkQA · 2018-10-25T23:09:57Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4510/

SparkQA · 2018-10-26T02:21:32Z

Test build #98053 has finished for PR 22608 at commit 66fe408.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-26T03:00:12Z

Test build #98055 has finished for PR 22608 at commit 0639099.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2018-10-26T21:54:13Z

Just noticed this, but could you open a separate bug for adding these tests, instead of re-using the one where the main code was added? It's a large enough thing that it should be a separate thing.

ifilonenko · 2018-10-26T21:58:57Z

Just noticed this, but could you open a separate bug for adding these tests, instead of re-using the one where the main code was added? It's a large enough thing that it should be a separate thing.

I had https://issues.apache.org/jira/browse/SPARK-25750 and linked this PR to that JIRA issue.

SparkQA · 2018-10-27T00:13:36Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4551/

SparkQA · 2018-10-27T00:26:17Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4551/

SparkQA · 2018-10-27T04:11:42Z

Test build #98109 has finished for PR 22608 at commit b0696da.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

liyinan926 · 2018-10-30T16:00:48Z

bin/docker-image-tool.sh

    # the examples directory is cleaned up before generating the distribution tarball, so this
    # issue does not occur.
-    IMG_PATH=resource-managers/kubernetes/docker/src/main/dockerfiles
+    IMG_PATH=resource-managers/kubernetes/docker/src


Do you still need changes to this file given you have moved the test stuffs out?

The dockerfiles and files for building the kerberos/ hadoop docker images are in src/test. It still seemed like a logical place to keep them with the /test tag, no?

I have the same question. It doesn't seem like you're actually using this script for the new test stuff, nor changing any of the existing calls to it, so do you need any of the changes being made here?

liyinan926 · 2018-10-30T16:01:31Z

dev/make-distribution.sh

 if [ -d "$SPARK_HOME"/resource-managers/kubernetes/core/target/ ]; then
  mkdir -p "$DISTDIR/kubernetes/"
-  cp -a "$SPARK_HOME"/resource-managers/kubernetes/docker/src/main/dockerfiles "$DISTDIR/kubernetes/"
+  cp -a "$SPARK_HOME"/resource-managers/kubernetes/docker/src "$DISTDIR/kubernetes/"


Ditto. Why is this change still needed?

resource-managers/kubernetes/docker/src/test/scripts/populate-data.sh

resource-managers/kubernetes/integration-tests/kerberos-yml/data-populator-deployment.yml

resource-managers/kubernetes/integration-tests/kerberos-yml/data-populator-service.yml

resource-managers/kubernetes/integration-tests/kerberos-yml/dn1-deployment.yml

resource-managers/kubernetes/integration-tests/kerberos-yml/kerberos-deployment.yml

resource-managers/kubernetes/integration-tests/kerberos-yml/kerberos-test.yml

resource-managers/kubernetes/integration-tests/kerberos-yml/nn-deployment.yml

SparkQA · 2018-10-30T22:26:42Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4640/

SparkQA · 2018-10-30T22:41:24Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4640/

liyinan926 · 2018-10-30T23:38:16Z

resource-managers/kubernetes/integration-tests/kerberos-yml/kerberos-set.yml

+      restartPolicy: Always
+      volumes:
+      - name: kerb-keytab
+        persistentVolumeClaim:


With a StatefulSet, you don't need to explicitly manage PVCs. You can use .spec.persistentVolumeClaimTemplate. The StatefulSet controller automatically creates the PV (or binds to the existing one it created before).

liyinan926 · 2018-10-30T23:38:58Z

...test/scala/org/apache/spark/deploy/k8s/integrationtest/kerberos/KerberosPVWatcherCache.scala

+
+/**
+ * This class is responsible for ensuring that the persistent volume claims are bounded
+ * to the correct persistent volume and that they are both created before launching the


With StatefulSets, you probably don't need this.

SparkQA · 2018-10-31T02:11:02Z

Test build #98281 has finished for PR 22608 at commit 0de8c87.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
trait UnevaluableAggregate extends DeclarativeAggregate
case class Average(child: Expression) extends DeclarativeAggregate with ImplicitCastInputTypes
case class Count(children: Seq[Expression]) extends DeclarativeAggregate
abstract class UnevaluableBooleanAggBase(arg: Expression)
case class EveryAgg(arg: Expression) extends UnevaluableBooleanAggBase(arg)
case class AnyAgg(arg: Expression) extends UnevaluableBooleanAggBase(arg)
case class SomeAgg(arg: Expression) extends UnevaluableBooleanAggBase(arg)
case class UnresolvedCatalystToExternalMap(

vanzin

You seem to be running different pods for KDC, NN and DN. Is there an advantage to that?

Seems to me you could do the same thing with a single pod and simplify things here.

The it README also mentions "3 CPUs and 4G of memory". Is that still enough with these new things that are run?

vanzin · 2018-11-01T23:03:46Z

dev/make-distribution.sh

  mkdir -p "$DISTDIR/kubernetes/"
-  cp -a "$SPARK_HOME"/resource-managers/kubernetes/docker/src/main/dockerfiles "$DISTDIR/kubernetes/"
+  cp -a "$SPARK_HOME"/resource-managers/kubernetes/docker/src "$DISTDIR/kubernetes/"
+  cp -a "$SPARK_HOME"/resource-managers/kubernetes/integration-tests/scripts "$DISTDIR/kubernetes/"


This is following the existing pattern in the line below; but is there a purpose in packaging these test artifacts with a binary Spark distribution?

Seems to me like they should be left in the source package and that's it.

vanzin · 2018-11-01T23:06:46Z

resource-managers/kubernetes/docker/src/test/scripts/populate-data.sh

+hdfs dfs -copyFromLocal /people.txt /user/userone
+
+hdfs dfs -chmod -R 755 /user/userone
+hdfs dfs -chown -R ifilonenko /user/userone


ifilonenko?

vanzin · 2018-11-01T23:09:18Z

resource-managers/kubernetes/docker/src/test/scripts/run-kerberos-test.sh

+      --conf spark.kubernetes.namespace=${NAMESPACE} \
+      --conf spark.executor.instances=1 \
+      --conf spark.app.name=spark-hdfs \
+      --conf spark.driver.extraClassPath=/opt/spark/hconf/core-site.xml:/opt/spark/hconf/hdfs-site.xml:/opt/spark/hconf/yarn-site.xml:/etc/krb5.conf \


Adding files to the classpath does not do anything.

$ scala -cp /etc/krb5.conf scala> getClass().getResource("/krb5.conf") res0: java.net.URL = null $ scala -cp /etc scala> getClass().getResource("/krb5.conf") res0: java.net.URL = file:/etc/krb5.conf

So this seems not needed. Also because I'd expect spark-submit or the k8s backend code to add the hadoop conf to the driver's classpath somehow.

mccheah · 2018-11-02T01:35:24Z

You seem to be running different pods for KDC, NN and DN. Is there an advantage to that?

Seems to me you could do the same thing with a single pod and simplify things here.

The it README also mentions "3 CPUs and 4G of memory". Is that still enough with these new things that are run?

Think we want different images for each, but that's fine - just run a pod with those three containers in it.

vanzin · 2018-11-02T02:24:36Z

Think we want different images for each

You don't need to, right? You can have a single image with all the stuff needed. That would also make setting up the test faster (less images to build).

just run a pod with those three containers

That's mostly me still getting used to names here; to me pod == one container running with some stuff.

But in any case, my main concern in this case is resource utilization - it we can keep things slimmer by running less containers, I think that's better. Individually, the NN, DN and the KDC don't need a lot of resources for this particular test to run.

vanzin · 2018-11-02T02:27:50Z

resource-managers/kubernetes/docker/src/test/hadoop/conf/yarn-site.xml

+<!-- Put site-specific property overrides in this file. -->
+
+<configuration>
+  <!-- must be set for HDFS libraries to obtain delegation tokens -->


You could put this in hdfs-site.xml and avoid having to deal with this extra file.

mccheah · 2018-11-02T02:42:57Z

It depends on how we're getting the Hadoop images. If we're building everything from scratch, we could run everything in one container - though having a container run more than one process simultaneously isn't common. It's more common to have a single container have a single responsibility / process. But you can group multiple containers that have related responsibilities into a single pod, hence we'll use 3 containers in one pod here.

If we're pulling Hadoop images from elsewhere - which it sounds like we aren't doing in the Apache ecosystem in general though - then we'd need to build our own separate image for the KDC anyways.

Multiple containers in the same pod all share the same resource footprint and limit boundaries.

vanzin · 2018-12-20T18:54:41Z

@ifilonenko any plans to bring this up to date?

ifilonenko · 2018-12-21T17:09:38Z

@vanzin yeah, will resync branch and resolve comments to bring this feature in

skonto · 2019-02-05T11:19:26Z

@mccheah it is possible to use multiple processes per container for testing: https://cloud.google.com/solutions/best-practices-for-building-containers (several vendors do).
Yet it is much cleaner to fail at the container level when debugging stuff so I also think a pod is a better choice (it will be as if we are running all hdfs stuff on the same host). Kdc also should be on its own, at least this is how we tested this with integration tests in the past on mesos and it was much cleaner again.
There is an option to use hadoop images: https://hadoop.apache.org/docs/r2.7.3/hadoop-yarn/hadoop-yarn-site/DockerContainerExecutor.html so in the future it might be better to integrate with kerberos there if people want to test with a specific hadoop version (not sure if version matters).

## What changes were proposed in this pull request? This is the work on setting up Secure HDFS interaction with Spark-on-K8S. The architecture is discussed in this community-wide google [doc](https://docs.google.com/document/d/1RBnXD9jMDjGonOdKJ2bA1lN4AAV_1RwpU_ewFuCNWKg) This initiative can be broken down into 4 Stages **STAGE 1** - [x] Detecting `HADOOP_CONF_DIR` environmental variable and using Config Maps to store all Hadoop config files locally, while also setting `HADOOP_CONF_DIR` locally in the driver / executors **STAGE 2** - [x] Grabbing `TGT` from `LTC` or using keytabs+principle and creating a `DT` that will be mounted as a secret or using a pre-populated secret **STAGE 3** - [x] Driver **STAGE 4** - [x] Executor ## How was this patch tested? Locally tested on a single-noded, pseudo-distributed Kerberized Hadoop Cluster - [x] E2E Integration tests apache#22608 - [ ] Unit tests ## Docs and Error Handling? - [x] Docs - [x] Error Handling ## Contribution Credit kimoonkim skonto Closes apache#21669 from ifilonenko/secure-hdfs. Lead-authored-by: Ilan Filonenko <if56@cornell.edu> Co-authored-by: Ilan Filonenko <ifilondz@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>

vanzin · 2019-03-04T18:03:02Z

I'm closing this for now due to inactivity. If the branch is updated the PR will reopen. Or someone else can pick this up.

This is the work on setting up Secure HDFS interaction with Spark-on-K8S. The architecture is discussed in this community-wide google [doc](https://docs.google.com/document/d/1RBnXD9jMDjGonOdKJ2bA1lN4AAV_1RwpU_ewFuCNWKg) This initiative can be broken down into 4 Stages **STAGE 1** - [x] Detecting `HADOOP_CONF_DIR` environmental variable and using Config Maps to store all Hadoop config files locally, while also setting `HADOOP_CONF_DIR` locally in the driver / executors **STAGE 2** - [x] Grabbing `TGT` from `LTC` or using keytabs+principle and creating a `DT` that will be mounted as a secret or using a pre-populated secret **STAGE 3** - [x] Driver **STAGE 4** - [x] Executor Locally tested on a single-noded, pseudo-distributed Kerberized Hadoop Cluster - [x] E2E Integration tests apache#22608 - [ ] Unit tests - [x] Docs - [x] Error Handling kimoonkim skonto Closes apache#21669 from ifilonenko/secure-hdfs. Lead-authored-by: Ilan Filonenko <if56@cornell.edu> Co-authored-by: Ilan Filonenko <ifilondz@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>

ifilonenko added 10 commits October 2, 2018 08:42

initial commit

31fc536

initial work on secure-hdfs integration testing

9bfa86a

small fix

77ea92a

fixed issue of docker building

761254c

fixes and organizations

6e3966f

traits and polymorphosim

776617d

polymorphism fixes and generuc class types

7f1ccb6

working test cases (just need clusterrolebindings)

3ab4358

small changes with addition of old tests

cfe7990

bring back sparkr

54316ba

ifilonenko mentioned this pull request Oct 2, 2018

[SPARK-23257][K8S] Kerberos Support for Spark on K8S #21669

Closed

8 tasks

add necessary apache license to pass RAT tests

56e2c6e

ifilonenko added 2 commits October 15, 2018 16:59

merge conflicts

330595d

style

436f652

style

cccf027

merge with 4.1.0 version

b0696da

liyinan926 reviewed Oct 30, 2018

View reviewed changes

ifilonenko added 2 commits October 30, 2018 14:29

resolve comments

a32ec4a

merge conflict

0de8c87

liyinan926 reviewed Oct 30, 2018

View reviewed changes

vanzin reviewed Nov 1, 2018

View reviewed changes

vanzin reviewed Nov 2, 2018

View reviewed changes

skonto mentioned this pull request Feb 5, 2019

[SPARK-23153][K8s] Support client dependencies with a Hadoop Compatible File System #23546

Closed

vanzin closed this Mar 4, 2019

[SPARK-25750][K8S][TESTS] Kerberos Support Integration Tests #22608

[SPARK-25750][K8S][TESTS] Kerberos Support Integration Tests #22608

Uh oh!

Conversation

ifilonenko commented Oct 2, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

ifilonenko commented Oct 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Oct 2, 2018

Uh oh!

SparkQA commented Oct 2, 2018

Uh oh!

SparkQA commented Oct 2, 2018

Uh oh!

SparkQA commented Oct 2, 2018

Uh oh!

SparkQA commented Oct 2, 2018

Uh oh!

SparkQA commented Oct 2, 2018

Uh oh!

erikerlandson commented Oct 3, 2018

Uh oh!

erikerlandson commented Oct 3, 2018

Uh oh!

erikerlandson commented Oct 6, 2018

Uh oh!

ifilonenko commented Oct 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

felixcheung commented Oct 7, 2018

Uh oh!

ifilonenko commented Oct 7, 2018

Uh oh!

SparkQA commented Oct 16, 2018

Uh oh!

SparkQA commented Oct 16, 2018

Uh oh!

SparkQA commented Oct 25, 2018

Uh oh!

SparkQA commented Oct 25, 2018

Uh oh!

SparkQA commented Oct 26, 2018

Uh oh!

SparkQA commented Oct 26, 2018

Uh oh!

vanzin commented Oct 26, 2018

Uh oh!

ifilonenko commented Oct 26, 2018

Uh oh!

SparkQA commented Oct 27, 2018

Uh oh!

SparkQA commented Oct 27, 2018

Uh oh!

SparkQA commented Oct 27, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SparkQA commented Oct 30, 2018

Uh oh!

SparkQA commented Oct 30, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

ifilonenko commented Oct 2, 2018 •

edited

Loading

ifilonenko commented Oct 7, 2018 •

edited

Loading

vanzin left a comment •

edited

Loading