Byzer helm project is desighed to help the users deploy Byzer-lang and Byzer Notebook in Kubernetes.
- helm >= 3.9
helm repo add byzer http://store.mlsql.tech/charts
helm repo update
helm install -n byzer --create-namespace bz byzer/Byzer-lang \
--set clusterUrl=https://192.168.3.42:16443 \
--set fs.defaultFS=oss://xxxx \
--set fs.impl=org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem \
--set fs."oss\.endpoint"=oss-cn-hangzhou.aliyuncs.com \
--set fs."oss\.accessKeyId"=xxxx \
--set fs."oss\.accessKeySecret"=xxxxx \
--set image.repository="registry.cn-shanghai.aliyuncs.com/kyligence-byzer/byzer-lang-k8s-full" \
--set image.tag="3.3.0-2.4.0-SNAPSHOT-2023-02-16"
helm install -n byzer --create-namespace bz byzer/Byzer-lang --values values.bz.yaml
The content of values.bz.yaml
:
clusterUrl: https://192.168.3.42:16443
fs:
cloud:
storage:
enabled: true
oss.impl: org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem
defaultFS: oss://xxxxxx
oss.endpoint: oss-cn-hangzhou.aliyuncs.com
oss.accessKeyId: xxxxx
oss.accessKeySecret: xxxxxx
image:
repository: registry.cn-shanghai.aliyuncs.com/kyligence-byzer/byzer-lang-k8s-full
pullPolicy: IfNotPresent
# Overrides the image tag whose default is the chart appVersion.
tag: "3.3.0-2.4.0-SNAPSHOT-2023-02-16"
The more configuration examples of object store e.g. S3, Blob,Cos please check the values.example.yaml
If you want to deploy multi byzer-lang Engine instance, run it with different name and serviceAccount. For example. For example:
helm install -n byzer --create-namespace bmz byzer/Byzer-lang \
--set serviceAccount.name=byzer-bmz
...
Here we set name as bmz
and set serviceAccount.name as byzer-bmz.
Here is the default configuration:
spark:
driver.memory: 4g
driver.cores: 2
driver.maxResultSize: 1g
driver.memoryOverheadFactor: 0.4
executor.memory: 1g
executor.cores: 1
executor.instances: 1
executor.memoryOverheadFactor: 0.4
The requests/limits of memory resource in Kubenertes are caculated with following fomular:
driver: ceil (4g * 0.4(memoryOverheadFactor) + 4g) = 6Gi
executor: (1g * 0.4(memoryOverheadFactor) + 1g) = 1400Mi
Example snippet like following:
spark:
shuffle.manager: org.apache.spark.shuffle.rss.RssShuffleManager
rss.master.address: xxxxx:9097
serializer: org.apache.spark.serializer.KryoSerializer
rss.shuffle.writer.mode: hash
rss.push.data.replicate: true
shuffle.service.enabled: false
sql.adaptive.localShuffleReader.enabled: false
sql.adaptive.enabled: true
sql.adaptive.skewJoin.enabled: true
Example snippet like following:
spark:
ui.prometheus.enabled: true
metrics.appStatusSource.enabled: true
kubernetes.driver.annotation.executors.prometheus.io/scrape: true
kubernetes.driver.annotation.executors.prometheus.io/path: /metrics/executors/prometheus
kubernetes.driver.annotation.executors.prometheus.io/port: 4040
kubernetes.driver.annotation.jmx.prometheus.io/scrape: true
kubernetes.driver.annotation.jmx.prometheus.io/path: /metrics
kubernetes.driver.annotation.jmx.prometheus.io/port: 8080
kubernetes.executor.annotation.jmx.prometheus.io/scrape: true
kubernetes.executor.annotation.jmx.prometheus.io/path: /metrics
kubernetes.executor.annotation.jmx.prometheus.io/port: 8080
metrics.conf.*.sink.jmx.class: org.apache.spark.metrics.sink.JmxSink
Suppose the user want to add a new jar to Byzer-lang Image, the first step is to create an new dockerfile. The content like following:
# The base image includes
# 1. OpenJDK8
# 2. Spark 3.3.0-bin-hadoop3
# Byzer-lang
# Byzer-lang plugins
# azure-hadoop shade jar
# Directory structure
# |- /work
# |--- spark
# |--- jdk
# |- /home
# |--- deploy
# |------ byzer-lang
# |-------- plugin
# |-------- libs
# |-------- main
# |-------- bin
# |-------- logs
ARG TAG
FROM byzer/byzer-lang-k8s-base:$TAG
ADD https://download.byzer.org/byzer/misc/cloud/oss/byzer-objectstore-oss-3.3_2.12-0.1.0-SNAPSHOT.jar /home/deploy/byzer-lang/libs/
ADD https://download.byzer.org/byzer/misc/rss-shuffle-manager-1.0.0-shaded.jar $SPARK_HOME/jars
Then build:
export BYZER_LANG_VERSION=${BYZER_LANG_VERSION:-2.4.0-SNAPSHOT}
export SPARK_VERSION=${SPARK_VERSION:-3.3.0}
docker build -t byzer/byzer-lang-k8s-full:"${SPARK_VERSION}-${BYZER_LANG_VERSION}" \
--build-arg TAG="${SPARK_VERSION}-${BYZER_LANG_VERSION}" \
-f "/byzer/full/Dockerfile" \
"/byzer/full"
helm install -n byzer --create-namespace nb byzer/Byzer-notebook \
--set name=nb-byzer-notebook \
--set engine=bz-byzer-lang-service \
--set domain=cluster.local \
--set notebook."database\.ip"=192.168.3.14 \
--set notebook."database\.username"=xxx \
--set notebook."database\.password"=xxxx \
--set image.repository="registry.cn-shanghai.aliyuncs.com/kyligence-byzer/byzer-notebook" \
--set image.tag="1.2.3-2023-02-16"
Notice that the name is the Byzer Notebook service name, and the engine is the byzer engine service name which the byzer notebook try to connect.