Skip to content

Commit bb9e1d9

Browse files
dongjoon-hyunsarutak
authored andcommitted
[SPARK-37319][K8S] Support K8s image building with Java 17
### What changes were proposed in this pull request? This PR aims to support K8s image building with Java 17. Please note that we need more efforts to achieve to run all tests successfully. ### Why are the changes needed? `OpenJDK` docker hub image switches the underlying OS from `Debian` to `OracleLinux` since Java 12. So, `java_image_tag` doesn't work any longer. **BEFORE** ``` $ bin/docker-image-tool.sh -n -b java_image_tag=17 build [+] Building 0.8s (6/17) => [internal] load build definition from Dockerfile 0.0s => => transferring dockerfile: 37B 0.0s => [internal] load .dockerignore 0.0s => => transferring context: 2B 0.0s => [internal] load metadata for docker.io/library/openjdk:17 0.4s => CACHED [ 1/13] FROM docker.io/library/openjdk:17sha256:c7fffc2024948e6d75922025a17b7d81cb747fd0fe0167fef13c6fcfc72e4144 0.0s => [internal] load build context 0.1s => => transferring context: 69.25kB 0.0s => ERROR [ 2/13] RUN set -ex && sed -i 's/http:\/\/deb.\(.*\)/https:\/\/deb.\1/g' /etc/apt/sources.list && apt-get update && ln -s /li 0.2s ------ > [ 2/13] RUN set -ex && sed -i 's/http:\/\/deb.\(.*\)/https:\/\/deb.\1/g' /etc/apt/sources.list && apt-get update && ln -s /lib /lib64 && apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p /opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && ln -sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && rm -rf /var/cache/apt/*: #5 0.230 + sed -i 's/http:\/\/deb.\(.*\)/https:\/\/deb.\1/g' /etc/apt/sources.list #5 0.232 sed: can't read /etc/apt/sources.list: No such file or directory ------ executor failed running [/bin/sh -c set -ex && sed -i 's/http:\/\/deb.\(.*\)/https:\/\/deb.\1/g' /etc/apt/sources.list && apt-get update && ln -s /lib /lib64 && apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p /opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && ln -sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && rm -rf /var/cache/apt/*]: exit code: 2 Failed to build Spark JVM Docker image, please refer to Docker build output for details. ``` **AFTER (This PR with `-f` option)** ``` $ bin/docker-image-tool.sh -n -f kubernetes/dockerfiles/spark/Dockerfile.java17 build [+] Building 29.3s (19/19) FINISHED => [internal] load build definition from Dockerfile.java17 0.0s => => transferring dockerfile: 2.49kB 0.0s => [internal] load .dockerignore 0.0s => => transferring context: 2B 0.0s => [internal] load metadata for docker.io/library/debian:bullseye-slim 1.5s => [auth] library/debian:pull token for registry-1.docker.io 0.0s => [internal] load build context 0.1s => => transferring context: 80.54kB 0.0s => CACHED [ 1/13] FROM docker.io/library/debian:bullseye-slimsha256:dddc0f5f01db7ca3599fd8cf9821ffc4d09ec9d7d15e49019e73228ac1eee7f9 0.0s => [ 2/13] RUN set -ex && apt-get update && ln -s /lib /lib64 && apt install -y bash tini libc6 libpam-modules krb5-user libnss3 proc 25.5s => [ 3/13] COPY jars /opt/spark/jars 0.4s => [ 4/13] COPY bin /opt/spark/bin 0.0s => [ 5/13] COPY sbin /opt/spark/sbin 0.0s => [ 6/13] COPY kubernetes/dockerfiles/spark/entrypoint.sh /opt/ 0.0s => [ 7/13] COPY kubernetes/dockerfiles/spark/decom.sh /opt/ 0.0s => [ 8/13] COPY examples /opt/spark/examples 0.0s => [ 9/13] COPY kubernetes/tests /opt/spark/tests 0.0s => [10/13] COPY data /opt/spark/data 0.0s => [11/13] WORKDIR /opt/spark/work-dir 0.0s => [12/13] RUN chmod g+w /opt/spark/work-dir 0.2s => [13/13] RUN chmod a+x /opt/decom.sh 0.2s => exporting to image 1.3s => => exporting layers 1.3s => => writing image sha256:ec961d957826c9b7eb4d00e900262130fc1708aef6cb51298b627d4bc91f834b 0.0s => => naming to docker.io/library/spark 0.0s Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them ``` ### Does this PR introduce _any_ user-facing change? Yes, this is a new docker file exposed to the customer. ### How was this patch tested? Pass the K8s IT building. Closes #34586 from dongjoon-hyun/SPARK-37319. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
1 parent edbc7cf commit bb9e1d9

File tree

2 files changed

+70
-3
lines changed

2 files changed

+70
-3
lines changed

bin/docker-image-tool.sh

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -232,7 +232,8 @@ Commands:
232232
push Push a pre-built image to a registry. Requires a repository address to be provided.
233233
234234
Options:
235-
-f file Dockerfile to build for JVM based Jobs. By default builds the Dockerfile shipped with Spark.
235+
-f file (Optional) Dockerfile to build for JVM based Jobs. By default builds the Dockerfile shipped with Spark.
236+
For Java 17, use `-f kubernetes/dockerfiles/spark/Dockerfile.java17`
236237
-p file (Optional) Dockerfile to build for PySpark Jobs. Builds Python dependencies and ships with Spark.
237238
Skips building PySpark docker image if not specified.
238239
-R file (Optional) Dockerfile to build for SparkR Jobs. Builds R dependencies and ships with Spark.
@@ -267,15 +268,19 @@ Examples:
267268
$0 -r docker.io/myrepo -t v2.3.0 build
268269
$0 -r docker.io/myrepo -t v2.3.0 push
269270
270-
- Build and push JDK11-based image with tag "v3.0.0" to docker.io/myrepo
271+
- Build and push Java11-based image with tag "v3.0.0" to docker.io/myrepo
271272
$0 -r docker.io/myrepo -t v3.0.0 -b java_image_tag=11-jre-slim build
272273
$0 -r docker.io/myrepo -t v3.0.0 push
273274
274-
- Build and push JDK11-based image for multiple archs to docker.io/myrepo
275+
- Build and push Java11-based image for multiple archs to docker.io/myrepo
275276
$0 -r docker.io/myrepo -t v3.0.0 -X -b java_image_tag=11-jre-slim build
276277
# Note: buildx, which does cross building, needs to do the push during build
277278
# So there is no separate push step with -X
278279
280+
- Build and push Java17-based image with tag "v3.3.0" to docker.io/myrepo
281+
$0 -r docker.io/myrepo -t v3.3.0 -f kubernetes/dockerfiles/spark/Dockerfile.java17 build
282+
$0 -r docker.io/myrepo -t v3.3.0 push
283+
279284
EOF
280285
}
281286

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one or more
3+
# contributor license agreements. See the NOTICE file distributed with
4+
# this work for additional information regarding copyright ownership.
5+
# The ASF licenses this file to You under the Apache License, Version 2.0
6+
# (the "License"); you may not use this file except in compliance with
7+
# the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
#
17+
# We need to build from debian:bullseye-slim because openjdk switches its underlying OS
18+
# from debian to oraclelinux from openjdk:12
19+
FROM debian:bullseye-slim
20+
21+
ARG spark_uid=185
22+
23+
# Before building the docker image, first build and make a Spark distribution following
24+
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
25+
# If this docker file is being used in the context of building your images from a Spark
26+
# distribution, the docker build command should be invoked from the top level directory
27+
# of the Spark distribution. E.g.:
28+
# docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .
29+
30+
RUN set -ex && \
31+
apt-get update && \
32+
ln -s /lib /lib64 && \
33+
apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps openjdk-17-jre && \
34+
mkdir -p /opt/spark && \
35+
mkdir -p /opt/spark/examples && \
36+
mkdir -p /opt/spark/work-dir && \
37+
touch /opt/spark/RELEASE && \
38+
rm /bin/sh && \
39+
ln -sv /bin/bash /bin/sh && \
40+
echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
41+
chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
42+
rm -rf /var/cache/apt/*
43+
44+
COPY jars /opt/spark/jars
45+
COPY bin /opt/spark/bin
46+
COPY sbin /opt/spark/sbin
47+
COPY kubernetes/dockerfiles/spark/entrypoint.sh /opt/
48+
COPY kubernetes/dockerfiles/spark/decom.sh /opt/
49+
COPY examples /opt/spark/examples
50+
COPY kubernetes/tests /opt/spark/tests
51+
COPY data /opt/spark/data
52+
53+
ENV SPARK_HOME /opt/spark
54+
55+
WORKDIR /opt/spark/work-dir
56+
RUN chmod g+w /opt/spark/work-dir
57+
RUN chmod a+x /opt/decom.sh
58+
59+
ENTRYPOINT [ "/opt/entrypoint.sh" ]
60+
61+
# Specify the User that the actual main process will run as
62+
USER ${spark_uid}

0 commit comments

Comments
 (0)