Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 9 additions & 5 deletions bin/docker-image-tool.sh
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ function build {
# contain a lot of duplicated jars with the main Spark directory. In a proper distribution,
# the examples directory is cleaned up before generating the distribution tarball, so this
# issue does not occur.
IMG_PATH=resource-managers/kubernetes/docker/src/main/dockerfiles
IMG_PATH=resource-managers/kubernetes/docker/src
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still need changes to this file given you have moved the test stuffs out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dockerfiles and files for building the kerberos/ hadoop docker images are in src/test. It still seemed like a logical place to keep them with the /test tag, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same question. It doesn't seem like you're actually using this script for the new test stuff, nor changing any of the existing calls to it, so do you need any of the changes being made here?

JARS=assembly/target/scala-$SPARK_SCALA_VERSION/jars
BUILD_ARGS=(
${BUILD_PARAMS}
Expand All @@ -68,7 +68,7 @@ function build {
)
else
# Not passed as arguments to docker, but used to validate the Spark directory.
IMG_PATH="kubernetes/dockerfiles"
IMG_PATH="kubernetes/src"
JARS=jars
BUILD_ARGS=(${BUILD_PARAMS})
fi
Expand All @@ -91,23 +91,27 @@ function build {
--build-arg
base_img=$(image_ref spark)
)
local BASEDOCKERFILE=${BASEDOCKERFILE:-"$IMG_PATH/spark/Dockerfile"}
local PYDOCKERFILE=${PYDOCKERFILE:-"$IMG_PATH/spark/bindings/python/Dockerfile"}
local RDOCKERFILE=${RDOCKERFILE:-"$IMG_PATH/spark/bindings/R/Dockerfile"}
local BASEDOCKERFILE=${BASEDOCKERFILE:-"$IMG_PATH/main/dockerfiles/spark/Dockerfile"}
local PYDOCKERFILE=${PYDOCKERFILE:-"$IMG_PATH/main/dockerfiles/spark/bindings/python/Dockerfile"}
local RDOCKERFILE=${RDOCKERFILE:-"$IMG_PATH/main/dockerfiles/spark/bindings/R/Dockerfile"}

# Spark Base
docker build $NOCACHEARG "${BUILD_ARGS[@]}" \
-t $(image_ref spark) \
-f "$BASEDOCKERFILE" .
if [ $? -ne 0 ]; then
error "Failed to build Spark JVM Docker image, please refer to Docker build output for details."
fi

# PySpark
docker build $NOCACHEARG "${BINDING_BUILD_ARGS[@]}" \
-t $(image_ref spark-py) \
-f "$PYDOCKERFILE" .
if [ $? -ne 0 ]; then
error "Failed to build PySpark Docker image, please refer to Docker build output for details."
fi

# SparkR
docker build $NOCACHEARG "${BINDING_BUILD_ARGS[@]}" \
-t $(image_ref spark-r) \
-f "$RDOCKERFILE" .
Expand Down
3 changes: 2 additions & 1 deletion dev/make-distribution.sh
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,8 @@ fi
# Only create and copy the dockerfiles directory if the kubernetes artifacts were built.
if [ -d "$SPARK_HOME"/resource-managers/kubernetes/core/target/ ]; then
mkdir -p "$DISTDIR/kubernetes/"
cp -a "$SPARK_HOME"/resource-managers/kubernetes/docker/src/main/dockerfiles "$DISTDIR/kubernetes/"
cp -a "$SPARK_HOME"/resource-managers/kubernetes/docker/src "$DISTDIR/kubernetes/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. Why is this change still needed?

cp -a "$SPARK_HOME"/resource-managers/kubernetes/integration-tests/scripts "$DISTDIR/kubernetes/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is following the existing pattern in the line below; but is there a purpose in packaging these test artifacts with a binary Spark distribution?

Seems to me like they should be left in the source package and that's it.

cp -a "$SPARK_HOME"/resource-managers/kubernetes/integration-tests/tests "$DISTDIR/kubernetes/"
fi

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ FROM openjdk:8-alpine

ARG spark_jars=jars
ARG example_jars=examples/jars
ARG img_path=kubernetes/dockerfiles
ARG img_path=kubernetes/src/main/dockerfiles
ARG k8s_tests=kubernetes/tests

# Before building the docker image, first build and make a Spark distribution following
Expand Down
3 changes: 3 additions & 0 deletions resource-managers/kubernetes/docker/src/test/data/people.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Michael, 29
Andy, 30
Justin, 19
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

FROM centos:7

ARG hadoop_version
ARG k_img_path=kubernetes/src/test

RUN yum -y install krb5-server krb5-workstation
RUN yum -y install java-1.8.0-openjdk-headless
RUN yum -y install apache-commons-daemon-jsvc
RUN yum install net-tools -y
RUN yum install telnet telnet-server -y
RUN yum -y install which

RUN sed -i -e 's/#//' -e 's/default_ccache_name/# default_ccache_name/' /etc/krb5.conf

RUN useradd -u 1098 hdfs

ADD hadoop-${hadoop_version}.tar.gz /
RUN ln -s hadoop-${hadoop_version} hadoop
RUN chown -R -L hdfs /hadoop

COPY ${k_img_path}/hadoop/conf/ssl-server.xml /hadoop/etc/hadoop/
COPY ${k_img_path}/hadoop/conf/yarn-site.xml /hadoop/etc/hadoop/

COPY ${k_img_path}/scripts/start-namenode.sh /
COPY ${k_img_path}/scripts/start-datanode.sh /
COPY ${k_img_path}/scripts/populate-data.sh /
COPY ${k_img_path}/scripts/start-kdc.sh /

COPY ${k_img_path}/data/people.txt /
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

ARG base_img
FROM $base_img

ARG k_img_path=kubernetes/src/test

COPY ${k_img_path}/scripts/run-kerberos-test.sh /opt/spark/
COPY ${k_img_path}/hadoop/conf /opt/spark/hconf
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<configuration>

<property>
<name>ssl.server.truststore.location</name>
<value>/var/keytabs/hdfs.jks</value>
</property>

<property>
<name>ssl.server.truststore.password</name>
<value>changeme</value>
</property>

<property>
<name>ssl.server.keystore.location</name>
<value>/var/keytabs/hdfs.jks</value>
</property>

<property>
<name>ssl.server.keystore.password</name>
<value>changeme</value>
</property>

<property>
<name>ssl.server.keystore.keypassword</name>
<value>changeme</value>
</property>

</configuration>
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<!-- must be set for HDFS libraries to obtain delegation tokens -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could put this in hdfs-site.xml and avoid having to deal with this extra file.

<!-- (hardcoded to use this ID as the renewer) -->
<property>
<name>yarn.resourcemanager.principal</name>
<value>yarn/_HOST@CLUSTER.LOCAL</value>
</property>
</configuration>
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk
export PATH=/hadoop/bin:$PATH
export HADOOP_CONF_DIR=/hadoop/etc/hadoop
export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true ${HADOOP_OPTS}"
export KRB5CCNAME=KRBCONF
mkdir -p /hadoop/etc/data
cp ${TMP_KRB_LOC} /etc/krb5.conf
cp ${TMP_CORE_LOC} /hadoop/etc/hadoop/core-site.xml
cp ${TMP_HDFS_LOC} /hadoop/etc/hadoop/hdfs-site.xml

until kinit -kt /var/keytabs/hdfs.keytab hdfs/nn.${NAMESPACE}.svc.cluster.local; do sleep 2; done

until (echo > /dev/tcp/nn.${NAMESPACE}.svc.cluster.local/9000) >/dev/null 2>&1; do sleep 2; done

hdfs dfsadmin -safemode wait


hdfs dfs -mkdir -p /user/userone/
hdfs dfs -copyFromLocal /people.txt /user/userone

hdfs dfs -chmod -R 755 /user/userone
hdfs dfs -chown -R ifilonenko /user/userone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ifilonenko?

Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
sed -i -e 's/#//' -e 's/default_ccache_name/# default_ccache_name/' /etc/krb5.conf
export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true"
export HADOOP_JAAS_DEBUG=true
export HADOOP_ROOT_LOGGER=DEBUG,console
cp ${TMP_KRB_LOC} /etc/krb5.conf
cp ${TMP_CORE_LOC} /opt/spark/hconf/core-site.xml
cp ${TMP_HDFS_LOC} /opt/spark/hconf/hdfs-site.xml
mkdir -p /etc/krb5.conf.d
/opt/spark/bin/spark-submit \
--deploy-mode cluster \
--class ${CLASS_NAME} \
--master k8s://${MASTER_URL} \
--conf spark.kubernetes.namespace=${NAMESPACE} \
--conf spark.executor.instances=1 \
--conf spark.app.name=spark-hdfs \
--conf spark.driver.extraClassPath=/opt/spark/hconf/core-site.xml:/opt/spark/hconf/hdfs-site.xml:/opt/spark/hconf/yarn-site.xml:/etc/krb5.conf \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding files to the classpath does not do anything.

$ scala -cp /etc/krb5.conf
scala> getClass().getResource("/krb5.conf")
res0: java.net.URL = null

$ scala -cp /etc
scala> getClass().getResource("/krb5.conf")
res0: java.net.URL = file:/etc/krb5.conf

So this seems not needed. Also because I'd expect spark-submit or the k8s backend code to add the hadoop conf to the driver's classpath somehow.

--conf spark.kubernetes.container.image=${BASE_SPARK_IMAGE} \
--conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \
--conf spark.kerberos.keytab=/var/keytabs/hdfs.keytab \
--conf spark.kerberos.principal=hdfs/nn.${NAMESPACE}.svc.cluster.local@CLUSTER.LOCAL \
--conf spark.kubernetes.driver.label.spark-app-locator=${APP_LOCATOR_LABEL} \
${SUBMIT_RESOURCE} \
hdfs://nn.${NAMESPACE}.svc.cluster.local:9000/user/userone/people.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk
export PATH=/hadoop/bin:$PATH
export HADOOP_CONF_DIR=/hadoop/etc/hadoop
mkdir -p /hadoop/etc/data
cp ${TMP_KRB_LOC} /etc/krb5.conf
cp ${TMP_CORE_LOC} /hadoop/etc/hadoop/core-site.xml
cp ${TMP_HDFS_LOC} /hadoop/etc/hadoop/hdfs-site.xml

until kinit -kt /var/keytabs/hdfs.keytab hdfs/nn.${NAMESPACE}.svc.cluster.local; do sleep 15; done

echo "KDC is up and ready to go... starting up"

kdestroy

hdfs datanode
55 changes: 55 additions & 0 deletions resource-managers/kubernetes/docker/src/test/scripts/start-kdc.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk
export PATH=/hadoop/bin:$PATH
export HADOOP_CONF_DIR=/hadoop/etc/hadoop
mkdir -p /hadoop/etc/data
cp ${TMP_KRB_LOC} /etc/krb5.conf
cp ${TMP_CORE_LOC} /hadoop/etc/hadoop/core-site.xml
cp ${TMP_HDFS_LOC} /hadoop/etc/hadoop/hdfs-site.xml

/usr/sbin/kdb5_util -P changeme create -s


## password only user
/usr/sbin/kadmin.local -q "addprinc -randkey userone"
/usr/sbin/kadmin.local -q "ktadd -k /var/keytabs/userone.keytab userone"

/usr/sbin/kadmin.local -q "addprinc -randkey HTTP/server.${NAMESPACE}.svc.cluster.local"
/usr/sbin/kadmin.local -q "ktadd -k /var/keytabs/server.keytab HTTP/server.${NAMESPACE}.svc.cluster.local"

/usr/sbin/kadmin.local -q "addprinc -randkey hdfs/nn.${NAMESPACE}.svc.cluster.local"
/usr/sbin/kadmin.local -q "addprinc -randkey HTTP/nn.${NAMESPACE}.svc.cluster.local"
/usr/sbin/kadmin.local -q "addprinc -randkey hdfs/dn1.${NAMESPACE}.svc.cluster.local"
/usr/sbin/kadmin.local -q "addprinc -randkey HTTP/dn1.${NAMESPACE}.svc.cluster.local"

/usr/sbin/kadmin.local -q "ktadd -k /var/keytabs/hdfs.keytab hdfs/nn.${NAMESPACE}.svc.cluster.local"
/usr/sbin/kadmin.local -q "ktadd -k /var/keytabs/hdfs.keytab HTTP/nn.${NAMESPACE}.svc.cluster.local"
/usr/sbin/kadmin.local -q "ktadd -k /var/keytabs/hdfs.keytab hdfs/dn1.${NAMESPACE}.svc.cluster.local"
/usr/sbin/kadmin.local -q "ktadd -k /var/keytabs/hdfs.keytab HTTP/dn1.${NAMESPACE}.svc.cluster.local"

chown hdfs /var/keytabs/hdfs.keytab

keytool -genkey -alias nn.${NAMESPACE}.svc.cluster.local -keyalg rsa -keysize 1024 -dname "CN=nn.${NAMESPACE}.svc.cluster.local" -keypass changeme -keystore /var/keytabs/hdfs.jks -storepass changeme
keytool -genkey -alias dn1.${NAMESPACE}.svc.cluster.local -keyalg rsa -keysize 1024 -dname "CN=dn1.${NAMESPACE}.svc.cluster.local" -keypass changeme -keystore /var/keytabs/hdfs.jks -storepass changeme

chmod 700 /var/keytabs/hdfs.jks
chown hdfs /var/keytabs/hdfs.jks


krb5kdc -n
Loading