Skip to content
This repository has been archived by the owner on Sep 18, 2023. It is now read-only.

[NSE-433]Support Pre-built Jemalloc #434

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions kubernetes/README.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Gazelle on Kubernetes
This README contains the script for dockerfile and the script to run thrift server for benchmark like TPC-H, TPC-DS, ...etc.

## Building Spark Image
There are two methods to build Gazelle Docker Image

### Prerequisite
Building Spark Base Docker Image
```
docker build --tag spark-centos:3.1.1 .
```

### Method1: Building OAP Docker Image including Gazelle and other OAP projects
```
docker build --tag oap-centos:1.2 .
```

### Method2: Building Gazelle Docker Image only

TBD

## Run Spark on Kubernetes
Before doing this, we assume you have setup Kubernetes enironment and it worked properly. All the tool scripts are under "spark" folder.
We tested these scripts in Minikube environment. If you are using other Kubernetes distributions, you may need to make some changes to work properly.

### Create Spark User and Assign Cluster Role
Spark running on Kubernetes needs edit role of your Kubernetes cluster to create driver or executor pods.
Go to spark folder and execute the following command to create "spark" user and assign the role. Make sure you have logged in Kubernetes and have administor role of the cluster.
```
sh ./spark-kubernetes-prepare.sh
```

### Run Spark/OAP Job in Cluster mode
In Kubernetes, you can run Spark/OAP job using spark-submit in Cluster mode at any node which has access to your Kubernetes API server.

You can edit spark configuration files in spark/conf directory

#### Run Spark Pi Job
You can run a Spark Pi job for a simple testing of the enironment is working. Execute the following command. If you are running on the master node, you can ignore the --master parameter.
For example:
```
sh ./spark-pi.sh --master localhost:8443 --image oap-centos:1.1.1 --spark_conf ./conf
50 changes: 50 additions & 0 deletions kubernetes/docker/oap-centos/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

FROM spark-centos:3.1.1

MAINTAINER The Optimized Analyitics Package for Spark Platform (OAP) Authors https://github.com/Intel-bigdata/OAP/

#oap
ARG OAP_VERSION=1.2-SNAPSHOT
ENV OAP_VERSION ${OAP_VERSION}
ENV OAP_HOME /opt/home/conda/envs/oap-${OAP_VERSION}

USER root

ENV http_proxy http://10.239.4.80:913
ENV https_proxy http://10.239.4.80:913
ENV HTTP_PROXY http://10.239.4.80:913
ENV HTTPS_PROXY http://10.239.4.80:913

#install runtime prerequisites
RUN git config --global http.postBuffer 5m
RUN git config --global http.proxy http://10.239.4.80:913
RUN git config --global https.proxy https://10.239.4.80:913
RUN git config --global --add http.sslVersion tlsv1.2

RUN git clone https://github.com/oap-project/oap-tools.git oap_source && \
cd /opt/home/oap_source && \
sh /opt/home/oap_source/dev/install-runtime-dependencies.sh --with-rdma && \
rm -rf /opt/home/oap_source

# Install OAP conda packages
COPY oap.yml /opt/home/oap.yml
RUN /opt/home/conda/bin/conda config --system --set channel_priority flexible --set ssl_verify false --set proxy_servers.http http://child-prc.intel.com:913 --set proxy_servers.https http://child-prc.intel.com:913 && \
/opt/home/conda/bin/conda update -k -v -n base -c defaults conda --insecure && \
/opt/home/conda/bin/conda env create -k -v --file /opt/home/oap.yml --insecure

# Specify the User that the actual main process will run as
ARG spark_uid=185
USER ${spark_uid}
ENV PKG_CONFIG_PATH=/usr/local/lib64/pkgconfig/:${PKG_CONFIG_PATH}
55 changes: 55 additions & 0 deletions kubernetes/docker/oap-centos/Dockerfile.back
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

FROM spark-centos:1.2

MAINTAINER The Optimized Analyitics Package for Spark Platform (OAP) Authors https://github.com/Intel-bigdata/OAP/

#oap
ARG OAP_VERSION=1.2-SNAPSHOT
ENV OAP_VERSION ${OAP_VERSION}
ENV OAP_HOME /opt/home/conda/envs/oap-${OAP_VERSION}
ENV http_proxy=http://child-prc.intel.com:913
ENV https_proxy=https://child-prc.intel.com:913


USER root

#install runtime prerequisites
RUN git clone https://github.com/oap-project/oap-tools oap_source && \
cd /opt/home/oap_source && \
git checkout master && \
sh /opt/home/oap_source/dev/install-runtime-dependencies.sh --with-rdma && \
rm -rf /opt/home/oap_source

# Install OAP conda packages
COPY oap.yml /opt/home/oap.yml
ARG http_proxy=http://child-prc.intel.com:913
ARG https_proxy=http://child-prc.intel.com:913
ENV HTTP_PROXY http://child-prc.intel.com:913
ENV HTTPS_PROXY http://child-prc.intel.com:913
RUN /opt/home/conda/bin/conda config --set ssl_verify True

RUN yum install -y ca-certificates && update-ca-trust enable && update-ca-trust extract
ENV REQUESTS_CA_BUNDLE /etc/pki/tls/certs/ca-bundle.crt
ENV SSL_CERT_FILE /etc/pki/tls/certs/ca-bundle.crt

RUN yum install -y python-urllib3

RUN /opt/home/conda/bin/conda config --system --set channel_priority flexible --set proxy_servers.http http://child-prc.intel.com:913 --set proxy_servers.https https://child-prc.intel.com:913 && \
/opt/home/conda/bin/conda env create -v --file /opt/home/oap.yml

# Specify the User that the actual main process will run as
ARG spark_uid=185
USER ${spark_uid}
ENV PKG_CONFIG_PATH=/usr/local/lib64/pkgconfig/:${PKG_CONFIG_PATH}
8 changes: 8 additions & 0 deletions kubernetes/docker/oap-centos/oap.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
name: oap-1.2-SNAPSHOT
channels:
- intel
- conda-forge
- defaults
dependencies:
- python=3.7.7
- oap=1.1.1
107 changes: 107 additions & 0 deletions kubernetes/docker/spark-centos/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

FROM centos:centos7.6.1810 as builder

RUN yum install -y \
wget \
libdigest-sha-perl \
bzip2

ENV HTTP_PROXY http://child-prc.intel.com:913
ENV HTTPS_PROXY http://child-prc.intel.com:913

# Download miniconda 4.5.12, then upgrade it to 4.8.4
RUN wget --secure-protocol=TLSv1_2 --output-document miniconda.sh https://repo.continuum.io/miniconda/Miniconda3-4.5.12-Linux-x86_64.sh \
&& (echo '866ae9dff53ad0874e1d1a60b1ad1ef8 miniconda.sh' | md5sum -c) \
# Conda must be installed at /opt/home/conda
&& /bin/bash miniconda.sh -b -p /opt/home/conda \
&& rm miniconda.sh \
&& /opt/home/conda/bin/conda install --name base conda=4.8.4

FROM centos:centos7.6.1810

MAINTAINER The Optimized Analyitics Package for Spark Platform (OAP) Authors https://github.com/Intel-bigdata/OAP/

WORKDIR /opt/home

RUN yum install -y \
curl \
wget \
unzip \
maven \
git \
&& rm -rf /tmp/* /var/tmp/*

#python & conda
COPY --from=builder /opt/home/conda /opt/home/conda

# Source conda.sh for all login shells.
RUN ln -s /opt/home/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh

# Conda recommends using strict channel priority speed up conda operations and reduce package incompatibility problems.
# Set always_yes to avoid needing -y flags, and improve conda experience in Databricks notebooks.
RUN /opt/home/conda/bin/conda config --system --set channel_priority strict \
&& /opt/home/conda/bin/conda config --system --set always_yes True

#java
ENV JAVA_HOME /opt/home/jdk
ENV PATH ${JAVA_HOME}/bin:${PATH}
RUN wget https://enos.itcollege.ee/~jpoial/allalaadimised/jdk8/jdk-8u291-linux-x64.tar.gz && \
gunzip jdk-8u291-linux-x64.tar.gz && \
tar -xf jdk-8u291-linux-x64.tar -C /opt/home && \
rm jdk-8u291-linux-x64.tar && \
ln -s /opt/home/jdk1.8.0_291 /opt/home/jdk

ARG SPARK_VERSION=3.1.1

ENV SPARK_VERSION ${SPARK_VERSION}
ENV SPARK_HOME /opt/home/spark-${SPARK_VERSION}

#spark download
RUN wget https://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop2.7.tgz && \
tar -zxvf spark-${SPARK_VERSION}-bin-hadoop2.7.tgz && \
mv spark-${SPARK_VERSION}-bin-hadoop2.7 spark-${SPARK_VERSION} && \
rm spark-${SPARK_VERSION}-bin-hadoop2.7.tgz

#run
ARG spark_uid=185

ENV TINI_VERSION v0.19.0
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /opt/home/tini
RUN chmod +x /opt/home/tini

RUN set -ex && \
echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
rm -rf /var/cache/apt/* && \
chmod g+w /opt/home && \
mkdir -p /opt/home/logs && \
chmod g+w /opt/home/logs

ENV HOME_DIR /opt/home
ENV SPARK_LOG_DIR /opt/home/logs
ENV SPARK_CONF_DIR /opt/home/conf

COPY entrypoint.sh /opt/home/entrypoint.sh
COPY entrypoint-nop.sh /opt/home/entrypoint-nop.sh

COPY spark-thrift-server.sh /opt/home/spark-thrift-server.sh
COPY spark-sql.sh /opt/home/spark-sql.sh
COPY spark-shell.sh /opt/home/spark-shell.sh
COPY spark-submit.sh /opt/home/spark-submit.sh

ENTRYPOINT [ "/opt/home/entrypoint.sh" ]

# Specify the User that the actual main process will run as
USER ${spark_uid}
17 changes: 17 additions & 0 deletions kubernetes/docker/spark-centos/entrypoint-nop.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

tail -f $SPARK_HOME/conf/spark-defaults.conf.template
102 changes: 102 additions & 0 deletions kubernetes/docker/spark-centos/entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# echo commands to the terminal output
set -ex

# Check whether there is a passwd entry for the container UID
myuid=$(id -u)
mygid=$(id -g)
# turn off -e for getent because it will return error code in anonymous uid case
set +e
uidentry=$(getent passwd $myuid)
set -e

# If there is no passwd entry for the container UID, attempt to create one
if [ -z "$uidentry" ] ; then
if [ -w /etc/passwd ] ; then
echo "$myuid:x:$myuid:$mygid:${SPARK_USER_NAME:-anonymous uid}:$SPARK_HOME:/bin/false" >> /etc/passwd
else
echo "Container ENTRYPOINT failed to add passwd entry for anonymous UID"
fi
fi

SPARK_CLASSPATH="$SPARK_CLASSPATH:${SPARK_HOME}/jars/*"
env | grep SPARK_JAVA_OPT_ | sort -t_ -k4 -n | sed 's/[^=]*=\(.*\)/\1/g' > /tmp/java_opts.txt
readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt

if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then
SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH"
fi

if [ "$PYSPARK_MAJOR_PYTHON_VERSION" == "2" ]; then
pyv="$(python -V 2>&1)"
export PYTHON_VERSION="${pyv:7}"
export PYSPARK_PYTHON="python"
export PYSPARK_DRIVER_PYTHON="python"
elif [ "$PYSPARK_MAJOR_PYTHON_VERSION" == "3" ]; then
pyv3="$(python3 -V 2>&1)"
export PYTHON_VERSION="${pyv3:7}"
export PYSPARK_PYTHON="python3"
export PYSPARK_DRIVER_PYTHON="python3"
fi

# If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so Hadoop jars are available to the executor.
# It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding customizations of this value from elsewhere e.g. Docker/K8s.
if [ -n "${HADOOP_HOME}" ] && [ -z "${SPARK_DIST_CLASSPATH}" ]; then
export SPARK_DIST_CLASSPATH="$($HADOOP_HOME/bin/hadoop classpath)"
fi

if ! [ -z ${HADOOP_CONF_DIR+x} ]; then
SPARK_CLASSPATH="$HADOOP_CONF_DIR:$SPARK_CLASSPATH";
fi

case "$1" in
driver)
shift 1
CMD=(
"$SPARK_HOME/bin/spark-submit"
--conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS"
--deploy-mode client
"$@"
)
;;
executor)
shift 1
CMD=(
${JAVA_HOME}/bin/java
"${SPARK_EXECUTOR_JAVA_OPTS[@]}"
-Xms$SPARK_EXECUTOR_MEMORY
-Xmx$SPARK_EXECUTOR_MEMORY
-cp "$SPARK_CLASSPATH:$SPARK_DIST_CLASSPATH"
org.apache.spark.executor.CoarseGrainedExecutorBackend
--driver-url $SPARK_DRIVER_URL
--executor-id $SPARK_EXECUTOR_ID
--cores $SPARK_EXECUTOR_CORES
--app-id $SPARK_APPLICATION_ID
--hostname $SPARK_EXECUTOR_POD_IP
)
;;

*)
echo "Non-spark-on-k8s command provided, proceeding in pass-through mode..."
CMD=("$@")
;;
esac

# Execute the container CMD under tini for better hygiene
5 changes: 5 additions & 0 deletions kubernetes/docker/spark-centos/env.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
name: oap
channels:
- default
dependencies:
- python=3.7.7
Loading