Skip to content

Conversation

@ifilonenko
Copy link
Contributor

What changes were proposed in this pull request?

This fix includes just the integration tests for Kerberos Support

How was this patch tested?

This patch includes a single-noded pseudo-distributed Kerberized Hadoop cluster for the purpose of testing Kerberos interaction. The Keytabs are shared with Persistent Volumes and communication happens all within the same Kubernetes cluster.

@ifilonenko
Copy link
Contributor Author

ifilonenko commented Oct 2, 2018

@mccheah @liyinan926 @erikerlandson for review

Things to note:

  • clusterrolebindings might be needed to ensure driver can setup necessary resources.
  • Any way to include the hadoop-2.7.3.tgz so that the hadoop-base:latest image can be built on the fly as opposed to pulling from ifilonenko/hadoop-base:latest

@SparkQA
Copy link

SparkQA commented Oct 2, 2018

Test build #96845 has finished for PR 22608 at commit 54316ba.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 2, 2018

@SparkQA
Copy link

SparkQA commented Oct 2, 2018

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/3618/

@SparkQA
Copy link

SparkQA commented Oct 2, 2018

Test build #96854 has finished for PR 22608 at commit 56e2c6e.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 2, 2018

@SparkQA
Copy link

SparkQA commented Oct 2, 2018

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/3627/

@erikerlandson
Copy link
Contributor

@ifilonenko can we work with the existing service-account-name config parameters for obtaining the resource permissions?

@erikerlandson
Copy link
Contributor

re: hadoop-2.7.3.tgz is that something Shane needs to install on the testing infra, to build the images you want?

@erikerlandson
Copy link
Contributor

Although this is a large patch, its impact on existing code is small, and it is nearly all testing code. Unless the tests themselves are unstable, I'd consider this plausible to include with the 2.4 release.

@ifilonenko
Copy link
Contributor Author

ifilonenko commented Oct 7, 2018

@erikerlandson the clusterrolebinding is something the user who is testing should set up. As such, we may disregard that bullet-point from the conversation. However, I am wondering what are thoughts of calling an external docker-image like: ifilonenko/hadoop-base:latest for now? This would otherwise require for the hadoop-base image to be built in the docker-image-builder and for the distribution to contain the hadoop-2.7.3.tgz file for the image to build.

Although this is a large patch, its impact on existing code is small, and it is nearly all testing code. Unless the tests themselves are unstable, I'd consider this plausible to include with the 2.4 release.

Very true, this feature is very isolated and was designed to be extremely stable (via the WatcherCaches), but should only be merged with #21669. Would like a review on the design so that we may merge this in ASAP when the above PR is merged as they are completely isolated.

@felixcheung
Copy link
Member

calling an external docker-image like: ifilonenko/hadoop-base:latest for now

for now it's probably ok, but is there a solution before the next release?

@ifilonenko
Copy link
Contributor Author

for now it's probably ok, but is there a solution before the next release?

This integration-test suite works seemlessly and is quite robust when rebased on-top of the Kerberos PR. So if we leave this PR as is, it should be good for merge. Pulling from ifilonenko/hadoop-base:latest makes it soooo much easier :)

@SparkQA
Copy link

SparkQA commented Oct 16, 2018

Test build #97416 has finished for PR 22608 at commit 436f652.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 16, 2018

@SparkQA
Copy link

SparkQA commented Oct 25, 2018

@SparkQA
Copy link

SparkQA commented Oct 25, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4510/

@SparkQA
Copy link

SparkQA commented Oct 26, 2018

Test build #98053 has finished for PR 22608 at commit 66fe408.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 26, 2018

Test build #98055 has finished for PR 22608 at commit 0639099.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Oct 26, 2018

Just noticed this, but could you open a separate bug for adding these tests, instead of re-using the one where the main code was added? It's a large enough thing that it should be a separate thing.

@ifilonenko
Copy link
Contributor Author

Just noticed this, but could you open a separate bug for adding these tests, instead of re-using the one where the main code was added? It's a large enough thing that it should be a separate thing.

I had https://issues.apache.org/jira/browse/SPARK-25750 and linked this PR to that JIRA issue.

@SparkQA
Copy link

SparkQA commented Oct 27, 2018

@SparkQA
Copy link

SparkQA commented Oct 27, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4551/

@SparkQA
Copy link

SparkQA commented Oct 27, 2018

Test build #98109 has finished for PR 22608 at commit b0696da.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

# the examples directory is cleaned up before generating the distribution tarball, so this
# issue does not occur.
IMG_PATH=resource-managers/kubernetes/docker/src/main/dockerfiles
IMG_PATH=resource-managers/kubernetes/docker/src
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still need changes to this file given you have moved the test stuffs out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dockerfiles and files for building the kerberos/ hadoop docker images are in src/test. It still seemed like a logical place to keep them with the /test tag, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same question. It doesn't seem like you're actually using this script for the new test stuff, nor changing any of the existing calls to it, so do you need any of the changes being made here?

if [ -d "$SPARK_HOME"/resource-managers/kubernetes/core/target/ ]; then
mkdir -p "$DISTDIR/kubernetes/"
cp -a "$SPARK_HOME"/resource-managers/kubernetes/docker/src/main/dockerfiles "$DISTDIR/kubernetes/"
cp -a "$SPARK_HOME"/resource-managers/kubernetes/docker/src "$DISTDIR/kubernetes/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. Why is this change still needed?

@SparkQA
Copy link

SparkQA commented Oct 30, 2018

@SparkQA
Copy link

SparkQA commented Oct 30, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4640/

restartPolicy: Always
volumes:
- name: kerb-keytab
persistentVolumeClaim:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a StatefulSet, you don't need to explicitly manage PVCs. You can use .spec.persistentVolumeClaimTemplate. The StatefulSet controller automatically creates the PV (or binds to the existing one it created before).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


/**
* This class is responsible for ensuring that the persistent volume claims are bounded
* to the correct persistent volume and that they are both created before launching the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With StatefulSets, you probably don't need this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@SparkQA
Copy link

SparkQA commented Oct 31, 2018

Test build #98281 has finished for PR 22608 at commit 0de8c87.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait UnevaluableAggregate extends DeclarativeAggregate
  • case class Average(child: Expression) extends DeclarativeAggregate with ImplicitCastInputTypes
  • case class Count(children: Seq[Expression]) extends DeclarativeAggregate
  • abstract class UnevaluableBooleanAggBase(arg: Expression)
  • case class EveryAgg(arg: Expression) extends UnevaluableBooleanAggBase(arg)
  • case class AnyAgg(arg: Expression) extends UnevaluableBooleanAggBase(arg)
  • case class SomeAgg(arg: Expression) extends UnevaluableBooleanAggBase(arg)
  • case class UnresolvedCatalystToExternalMap(

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to be running different pods for KDC, NN and DN. Is there an advantage to that?

Seems to me you could do the same thing with a single pod and simplify things here.

The it README also mentions "3 CPUs and 4G of memory". Is that still enough with these new things that are run?

mkdir -p "$DISTDIR/kubernetes/"
cp -a "$SPARK_HOME"/resource-managers/kubernetes/docker/src/main/dockerfiles "$DISTDIR/kubernetes/"
cp -a "$SPARK_HOME"/resource-managers/kubernetes/docker/src "$DISTDIR/kubernetes/"
cp -a "$SPARK_HOME"/resource-managers/kubernetes/integration-tests/scripts "$DISTDIR/kubernetes/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is following the existing pattern in the line below; but is there a purpose in packaging these test artifacts with a binary Spark distribution?

Seems to me like they should be left in the source package and that's it.

hdfs dfs -copyFromLocal /people.txt /user/userone

hdfs dfs -chmod -R 755 /user/userone
hdfs dfs -chown -R ifilonenko /user/userone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ifilonenko?

--conf spark.kubernetes.namespace=${NAMESPACE} \
--conf spark.executor.instances=1 \
--conf spark.app.name=spark-hdfs \
--conf spark.driver.extraClassPath=/opt/spark/hconf/core-site.xml:/opt/spark/hconf/hdfs-site.xml:/opt/spark/hconf/yarn-site.xml:/etc/krb5.conf \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding files to the classpath does not do anything.

$ scala -cp /etc/krb5.conf
scala> getClass().getResource("/krb5.conf")
res0: java.net.URL = null

$ scala -cp /etc
scala> getClass().getResource("/krb5.conf")
res0: java.net.URL = file:/etc/krb5.conf

So this seems not needed. Also because I'd expect spark-submit or the k8s backend code to add the hadoop conf to the driver's classpath somehow.

@mccheah
Copy link
Contributor

mccheah commented Nov 2, 2018

You seem to be running different pods for KDC, NN and DN. Is there an advantage to that?

Seems to me you could do the same thing with a single pod and simplify things here.

The it README also mentions "3 CPUs and 4G of memory". Is that still enough with these new things that are run?

Think we want different images for each, but that's fine - just run a pod with those three containers in it.

@vanzin
Copy link
Contributor

vanzin commented Nov 2, 2018

Think we want different images for each

You don't need to, right? You can have a single image with all the stuff needed. That would also make setting up the test faster (less images to build).

just run a pod with those three containers

That's mostly me still getting used to names here; to me pod == one container running with some stuff.

But in any case, my main concern in this case is resource utilization - it we can keep things slimmer by running less containers, I think that's better. Individually, the NN, DN and the KDC don't need a lot of resources for this particular test to run.

<!-- Put site-specific property overrides in this file. -->

<configuration>
<!-- must be set for HDFS libraries to obtain delegation tokens -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could put this in hdfs-site.xml and avoid having to deal with this extra file.

@mccheah
Copy link
Contributor

mccheah commented Nov 2, 2018

It depends on how we're getting the Hadoop images. If we're building everything from scratch, we could run everything in one container - though having a container run more than one process simultaneously isn't common. It's more common to have a single container have a single responsibility / process. But you can group multiple containers that have related responsibilities into a single pod, hence we'll use 3 containers in one pod here.

If we're pulling Hadoop images from elsewhere - which it sounds like we aren't doing in the Apache ecosystem in general though - then we'd need to build our own separate image for the KDC anyways.

Multiple containers in the same pod all share the same resource footprint and limit boundaries.

@vanzin
Copy link
Contributor

vanzin commented Dec 20, 2018

@ifilonenko any plans to bring this up to date?

@ifilonenko
Copy link
Contributor Author

@vanzin yeah, will resync branch and resolve comments to bring this feature in

@skonto
Copy link
Contributor

skonto commented Feb 5, 2019

@mccheah it is possible to use multiple processes per container for testing: https://cloud.google.com/solutions/best-practices-for-building-containers (several vendors do).
Yet it is much cleaner to fail at the container level when debugging stuff so I also think a pod is a better choice (it will be as if we are running all hdfs stuff on the same host). Kdc also should be on its own, at least this is how we tested this with integration tests in the past on mesos and it was much cleaner again.
There is an option to use hadoop images: https://hadoop.apache.org/docs/r2.7.3/hadoop-yarn/hadoop-yarn-site/DockerContainerExecutor.html so in the future it might be better to integrate with kerberos there if people want to test with a specific hadoop version (not sure if version matters).

jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?
This is the work on setting up Secure HDFS interaction with Spark-on-K8S.
The architecture is discussed in this community-wide google [doc](https://docs.google.com/document/d/1RBnXD9jMDjGonOdKJ2bA1lN4AAV_1RwpU_ewFuCNWKg)
This initiative can be broken down into 4 Stages

**STAGE 1**
- [x] Detecting `HADOOP_CONF_DIR` environmental variable and using Config Maps to store all Hadoop config files locally, while also setting `HADOOP_CONF_DIR` locally in the driver / executors

**STAGE 2**
- [x] Grabbing `TGT` from `LTC` or using keytabs+principle and creating a `DT` that will be mounted as a secret or using a pre-populated secret

**STAGE 3**
- [x] Driver

**STAGE 4**
- [x] Executor

## How was this patch tested?
Locally tested on a single-noded, pseudo-distributed Kerberized Hadoop Cluster
- [x] E2E Integration tests apache#22608
- [ ] Unit tests

## Docs and Error Handling?
- [x] Docs
- [x] Error Handling

## Contribution Credit
kimoonkim skonto

Closes apache#21669 from ifilonenko/secure-hdfs.

Lead-authored-by: Ilan Filonenko <if56@cornell.edu>
Co-authored-by: Ilan Filonenko <ifilondz@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
@vanzin
Copy link
Contributor

vanzin commented Mar 4, 2019

I'm closing this for now due to inactivity. If the branch is updated the PR will reopen. Or someone else can pick this up.

@vanzin vanzin closed this Mar 4, 2019
markterm pushed a commit to CodecLondon/spark that referenced this pull request Jul 4, 2019
This is the work on setting up Secure HDFS interaction with Spark-on-K8S.
The architecture is discussed in this community-wide google [doc](https://docs.google.com/document/d/1RBnXD9jMDjGonOdKJ2bA1lN4AAV_1RwpU_ewFuCNWKg)
This initiative can be broken down into 4 Stages

**STAGE 1**
- [x] Detecting `HADOOP_CONF_DIR` environmental variable and using Config Maps to store all Hadoop config files locally, while also setting `HADOOP_CONF_DIR` locally in the driver / executors

**STAGE 2**
- [x] Grabbing `TGT` from `LTC` or using keytabs+principle and creating a `DT` that will be mounted as a secret or using a pre-populated secret

**STAGE 3**
- [x] Driver

**STAGE 4**
- [x] Executor

Locally tested on a single-noded, pseudo-distributed Kerberized Hadoop Cluster
- [x] E2E Integration tests apache#22608
- [ ] Unit tests

- [x] Docs
- [x] Error Handling

kimoonkim skonto

Closes apache#21669 from ifilonenko/secure-hdfs.

Lead-authored-by: Ilan Filonenko <if56@cornell.edu>
Co-authored-by: Ilan Filonenko <ifilondz@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants