-
Notifications
You must be signed in to change notification settings - Fork 594
[Heron-3724] Separate the Manager and Executors. #3741
[Heron-3724] Separate the Manager and Executors. #3741
Conversation
Phase 2
Individual Pod Template loading for the Commands: ~/bin/heron submit kubernetes ~/.heron/examples/heron-api-examples.jar \
org.apache.heron.examples.api.AckingTopology acking \
--verbose \
--deploy-deactivated \
--config-property heron.kubernetes.executor.pod.template=pod-templ-executor.pod-template-executor.yaml \
--config-property heron.kubernetes.manager.pod.template=pod-templ-manager.pod-template-manager.yaml \
--config-property heron.kubernetes.manager.limits.cpu=2 \
--config-property heron.kubernetes.manager.limits.memory=3 \
--config-property heron.kubernetes.manager.requests.cpu=1 \
--config-property heron.kubernetes.manager.requests.memory=2 \
--config-property heron.kubernetes.volumes.persistentVolumeClaim.dynamicvolume.claimName=OnDemand \
--config-property heron.kubernetes.volumes.persistentVolumeClaim.dynamicvolume.accessModes=ReadWriteOnce,ReadOnlyMany \
--config-property heron.kubernetes.volumes.persistentVolumeClaim.dynamicvolume.sizeLimit=256Gi \
--config-property heron.kubernetes.volumes.persistentVolumeClaim.dynamicvolume.volumeMode=Block \
--config-property heron.kubernetes.volumes.persistentVolumeClaim.dynamicvolume.path=path/to/mount/dynamic/volume \
--config-property heron.kubernetes.volumes.persistentVolumeClaim.dynamicvolume.subPath=sub/path/to/mount/dynamic/volume \
--config-property heron.kubernetes.volumes.persistentVolumeClaim.staticvolume.claimName=OnDemand \
--config-property heron.kubernetes.volumes.persistentVolumeClaim.staticvolume.storageClassName=storage-class-name \
--config-property heron.kubernetes.volumes.persistentVolumeClaim.staticvolume.accessModes=ReadWriteOnce,ReadOnlyMany \
--config-property heron.kubernetes.volumes.persistentVolumeClaim.staticvolume.sizeLimit=512Gi \
--config-property heron.kubernetes.volumes.persistentVolumeClaim.staticvolume.volumeMode=Block \
--config-property heron.kubernetes.volumes.persistentVolumeClaim.staticvolume.path=path/to/mount/static/volume \
--config-property heron.kubernetes.volumes.persistentVolumeClaim.staticvolume.subPath=sub/path/to/mount/static/volume \
--config-property heron.kubernetes.volumes.persistentVolumeClaim.sharedvolume.claimName=requested-claim-by-user \
--config-property heron.kubernetes.volumes.persistentVolumeClaim.sharedvolume.path=path/to/mount/shared/volume \
--config-property heron.kubernetes.volumes.persistentVolumeClaim.sharedvolume.subPath=sub/path/to/mount/shared/volume Executor StatefulSetapiVersion: apps/v1
kind: StatefulSet
metadata:
creationTimestamp: "2021-11-29T00:14:41Z"
generation: 1
labels:
app: heron
topology: acking
name: acking-executors
namespace: default
resourceVersion: "2512"
uid: f4aa0815-256d-4c73-8ce7-68ff0bb26597
spec:
podManagementPolicy: Parallel
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app: heron
topology: acking
serviceName: acking
template:
metadata:
annotations:
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: heron
topology: acking
spec:
containers:
- command:
- sh
- -c
- './heron-core/bin/heron-downloader-config kubernetes && ./heron-core/bin/heron-downloader
distributedlog://zookeeper:2181/heronbkdl/acking-saad-tag-0-7629098208556017113.tar.gz
. && SHARD_ID=$((${POD_NAME##*-} + 1)) && echo shardId=${SHARD_ID} && ./heron-core/bin/heron-executor
--topology-name=acking --topology-id=acking6e4ded2b-ee0a-40ac-90ad-8780645bda9a
--topology-defn-file=acking.defn --state-manager-connection=zookeeper:2181
--state-manager-root=/heron --state-manager-config-file=./heron-conf/statemgr.yaml
--tmanager-binary=./heron-core/bin/heron-tmanager --stmgr-binary=./heron-core/bin/heron-stmgr
--metrics-manager-classpath=./heron-core/lib/metricsmgr/* --instance-jvm-opts="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg(61)(61)"
--classpath=heron-api-examples.jar --heron-internals-config-file=./heron-conf/heron_internals.yaml
--override-config-file=./heron-conf/override.yaml --component-ram-map=exclaim1:1073741824,word:1073741824
--component-jvm-opts="" --pkg-type=jar --topology-binary-file=heron-api-examples.jar
--heron-java-home=$JAVA_HOME --heron-shell-binary=./heron-core/bin/heron-shell
--cluster=kubernetes --role=saad --environment=default --instance-classpath=./heron-core/lib/instance/*
--metrics-sinks-config-file=./heron-conf/metrics_sinks.yaml --scheduler-classpath=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*
--python-instance-binary=./heron-core/bin/heron-python-instance --cpp-instance-binary=./heron-core/bin/heron-cpp-instance
--metricscache-manager-classpath=./heron-core/lib/metricscachemgr/* --metricscache-manager-mode=disabled
--is-stateful=false --checkpoint-manager-classpath=./heron-core/lib/ckptmgr/*:./heron-core/lib/statefulstorage/*:
--stateful-config-file=./heron-conf/stateful.yaml --checkpoint-manager-ram=1073741824
--health-manager-mode=disabled --health-manager-classpath=./heron-core/lib/healthmgr/*
--shard=$SHARD_ID --server-port=6001 --tmanager-controller-port=6002 --tmanager-stats-port=6003
--shell-port=6004 --metrics-manager-port=6005 --scheduler-port=6006 --metricscache-manager-server-port=6007
--metricscache-manager-stats-port=6008 --checkpoint-manager-port=6009'
env:
- name: HOST
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: var_one
value: variable one
- name: var_three
value: variable three
- name: var_two
value: variable two
image: apache/heron:testbuild
imagePullPolicy: IfNotPresent
name: executor
ports:
- containerPort: 5555
name: tcp-port-kept
protocol: TCP
- containerPort: 5556
name: udp-port-kept
protocol: UDP
- containerPort: 6001
name: server
protocol: TCP
- containerPort: 6002
name: tmanager-ctl
protocol: TCP
- containerPort: 6003
name: tmanager-stats
protocol: TCP
- containerPort: 6004
name: shell-port
protocol: TCP
- containerPort: 6005
name: metrics-mgr
protocol: TCP
- containerPort: 6006
name: scheduler
protocol: TCP
- containerPort: 6007
name: metrics-cache-m
protocol: TCP
- containerPort: 6008
name: metrics-cache-s
protocol: TCP
- containerPort: 6009
name: ckptmgr
protocol: TCP
resources:
limits:
cpu: "3"
memory: 4Gi
requests:
cpu: "3"
memory: 4Gi
securityContext:
allowPrivilegeEscalation: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: path/to/mount/dynamic/volume
name: dynamicvolume
subPath: sub/path/to/mount/dynamic/volume
- mountPath: /shared_volume
name: shared-volume
- mountPath: path/to/mount/shared/volume
name: sharedvolume
subPath: sub/path/to/mount/shared/volume
- mountPath: path/to/mount/static/volume
name: staticvolume
subPath: sub/path/to/mount/static/volume
- image: alpine
imagePullPolicy: Always
name: sidecar-container
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /shared_volume
name: shared-volume
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 0
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 10
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 10
volumes:
- emptyDir: {}
name: shared-volume
- name: sharedvolume
persistentVolumeClaim:
claimName: requested-claim-by-user
updateStrategy:
rollingUpdate:
partition: 0
type: RollingUpdate
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
labels:
onDemand: "true"
topology: acking
name: dynamicvolume
spec:
accessModes:
- ReadWriteOnce
- ReadOnlyMany
resources:
requests:
storage: 256Gi
volumeMode: Block
status:
phase: Pending
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
labels:
onDemand: "true"
topology: acking
name: staticvolume
spec:
accessModes:
- ReadWriteOnce
- ReadOnlyMany
resources:
requests:
storage: 512Gi
storageClassName: storage-class-name
volumeMode: Block
status:
phase: Pending
status:
collisionCount: 0
currentReplicas: 2
currentRevision: acking-executors-bc4fd98c4
observedGeneration: 1
replicas: 2
updateRevision: acking-executors-bc4fd98c4
updatedReplicas: 2 Manager StatefulSetapiVersion: apps/v1
kind: StatefulSet
metadata:
creationTimestamp: "2021-11-29T00:14:41Z"
generation: 1
labels:
app: heron
topology: acking
name: acking-manager
namespace: default
resourceVersion: "2513"
uid: 4e8e0e7a-8d20-4a1f-8cac-db9319af1cec
spec:
podManagementPolicy: Parallel
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: heron
topology: acking
serviceName: acking
template:
metadata:
annotations:
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: heron
topology: acking
spec:
containers:
- command:
- sh
- -c
- './heron-core/bin/heron-downloader-config kubernetes && ./heron-core/bin/heron-downloader
distributedlog://zookeeper:2181/heronbkdl/acking-saad-tag-0-7629098208556017113.tar.gz
. && SHARD_ID=${POD_NAME##*-} && echo shardId=${SHARD_ID} && ./heron-core/bin/heron-executor
--topology-name=acking --topology-id=acking6e4ded2b-ee0a-40ac-90ad-8780645bda9a
--topology-defn-file=acking.defn --state-manager-connection=zookeeper:2181
--state-manager-root=/heron --state-manager-config-file=./heron-conf/statemgr.yaml
--tmanager-binary=./heron-core/bin/heron-tmanager --stmgr-binary=./heron-core/bin/heron-stmgr
--metrics-manager-classpath=./heron-core/lib/metricsmgr/* --instance-jvm-opts="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg(61)(61)"
--classpath=heron-api-examples.jar --heron-internals-config-file=./heron-conf/heron_internals.yaml
--override-config-file=./heron-conf/override.yaml --component-ram-map=exclaim1:1073741824,word:1073741824
--component-jvm-opts="" --pkg-type=jar --topology-binary-file=heron-api-examples.jar
--heron-java-home=$JAVA_HOME --heron-shell-binary=./heron-core/bin/heron-shell
--cluster=kubernetes --role=saad --environment=default --instance-classpath=./heron-core/lib/instance/*
--metrics-sinks-config-file=./heron-conf/metrics_sinks.yaml --scheduler-classpath=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*
--python-instance-binary=./heron-core/bin/heron-python-instance --cpp-instance-binary=./heron-core/bin/heron-cpp-instance
--metricscache-manager-classpath=./heron-core/lib/metricscachemgr/* --metricscache-manager-mode=disabled
--is-stateful=false --checkpoint-manager-classpath=./heron-core/lib/ckptmgr/*:./heron-core/lib/statefulstorage/*:
--stateful-config-file=./heron-conf/stateful.yaml --checkpoint-manager-ram=1073741824
--health-manager-mode=disabled --health-manager-classpath=./heron-core/lib/healthmgr/*
--shard=$SHARD_ID --server-port=6001 --tmanager-controller-port=6002 --tmanager-stats-port=6003
--shell-port=6004 --metrics-manager-port=6005 --scheduler-port=6006 --metricscache-manager-server-port=6007
--metricscache-manager-stats-port=6008 --checkpoint-manager-port=6009'
env:
- name: HOST
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: var_one_manager
value: variable one on manager
- name: var_three_manager
value: variable three on manager
- name: var_two_manager
value: variable two on manager
image: apache/heron:testbuild
imagePullPolicy: IfNotPresent
name: manager
ports:
- containerPort: 6001
name: server
protocol: TCP
- containerPort: 6002
name: tmanager-ctl
protocol: TCP
- containerPort: 6003
name: tmanager-stats
protocol: TCP
- containerPort: 6004
name: shell-port
protocol: TCP
- containerPort: 6005
name: metrics-mgr
protocol: TCP
- containerPort: 6006
name: scheduler
protocol: TCP
- containerPort: 6007
name: metrics-cache-m
protocol: TCP
- containerPort: 6008
name: metrics-cache-s
protocol: TCP
- containerPort: 6009
name: ckptmgr
protocol: TCP
- containerPort: 7775
name: tcp-port-kept
protocol: TCP
- containerPort: 7776
name: udp-port-kept
protocol: UDP
resources:
limits:
cpu: "2"
memory: 3Gi
requests:
cpu: "1"
memory: 2Gi
securityContext:
allowPrivilegeEscalation: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: path/to/mount/dynamic/volume
name: dynamicvolume
subPath: sub/path/to/mount/dynamic/volume
- mountPath: /shared_volume/manager
name: shared-volume-manager
- mountPath: path/to/mount/shared/volume
name: sharedvolume
subPath: sub/path/to/mount/shared/volume
- mountPath: path/to/mount/static/volume
name: staticvolume
subPath: sub/path/to/mount/static/volume
- image: alpine
imagePullPolicy: Always
name: manager-sidecar-container
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /shared_volume/manager
name: shared-volume-manager
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 0
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 10
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 10
volumes:
- emptyDir: {}
name: shared-volume-manager
- name: sharedvolume
persistentVolumeClaim:
claimName: requested-claim-by-user
updateStrategy:
rollingUpdate:
partition: 0
type: RollingUpdate
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
labels:
onDemand: "true"
topology: acking
name: dynamicvolume
spec:
accessModes:
- ReadWriteOnce
- ReadOnlyMany
resources:
requests:
storage: 256Gi
volumeMode: Block
status:
phase: Pending
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
labels:
onDemand: "true"
topology: acking
name: staticvolume
spec:
accessModes:
- ReadWriteOnce
- ReadOnlyMany
resources:
requests:
storage: 512Gi
storageClassName: storage-class-name
volumeMode: Block
status:
phase: Pending
status:
collisionCount: 0
currentReplicas: 1
currentRevision: acking-manager-677c8b875b
observedGeneration: 1
replicas: 1
updateRevision: acking-manager-677c8b875b
updatedReplicas: 1 |
Phase 2
Please note that the commands have changed to load Pod Templates and PVCs and as such the docs are now mostly obsolete. Commands: ~/bin/heron submit kubernetes ~/.heron/examples/heron-api-examples.jar \
org.apache.heron.examples.api.AckingTopology acking \
--verbose \
--deploy-deactivated \
--config-property heron.kubernetes.executor.pod.template=pod-templ-executor.pod-template-executor.yaml \
--config-property heron.kubernetes.manager.pod.template=pod-templ-manager.pod-template-manager.yaml \
--config-property heron.kubernetes.manager.limits.cpu=2 \
--config-property heron.kubernetes.manager.limits.memory=3 \
--config-property heron.kubernetes.manager.requests.cpu=1 \
--config-property heron.kubernetes.manager.requests.memory=2 \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-dynamic-volume.claimName=OnDemand \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-dynamic-volume.accessModes=ReadWriteOnce,ReadOnlyMany \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-dynamic-volume.sizeLimit=256Gi \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-dynamic-volume.volumeMode=Block \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-dynamic-volume.path=path/to/mount/dynamic/volume \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-dynamic-volume.subPath=sub/path/to/mount/dynamic/volume \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-static-volume.claimName=OnDemand \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-static-volume.storageClassName=storage-class-name \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-static-volume.accessModes=ReadWriteOnce,ReadOnlyMany \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-static-volume.sizeLimit=512Gi \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-static-volume.volumeMode=Block \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-static-volume.path=path/to/mount/static/volume \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-static-volume.subPath=sub/path/to/mount/static/volume \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-shared-volume.claimName=requested-claim-by-user \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-shared-volume.path=path/to/mount/shared/volume \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-shared-volume.subPath=sub/path/to/mount/shared/volume \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-dynamic-volume.claimName=OnDemand \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-dynamic-volume.accessModes=ReadWriteOnce,ReadOnlyMany \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-dynamic-volume.sizeLimit=256Gi \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-dynamic-volume.volumeMode=Block \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-dynamic-volume.path=path/to/mount/dynamic/volume \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-dynamic-volume.subPath=sub/path/to/mount/dynamic/volume \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-static-volume.claimName=OnDemand \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-static-volume.storageClassName=storage-class-name \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-static-volume.accessModes=ReadWriteOnce,ReadOnlyMany \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-static-volume.sizeLimit=512Gi \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-static-volume.volumeMode=Block \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-static-volume.path=path/to/mount/static/volume \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-static-volume.subPath=sub/path/to/mount/static/volume \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-shared-volume.claimName=requested-claim-by-user \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-shared-volume.path=path/to/mount/shared/volume \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-shared-volume.subPath=sub/path/to/mount/shared/volume Executor StatefulSetapiVersion: apps/v1
kind: StatefulSet
metadata:
creationTimestamp: "2021-11-30T00:08:01Z"
generation: 1
labels:
app: heron
topology: acking
name: acking-executors
namespace: default
resourceVersion: "1650"
uid: 24e8e2fc-fc33-4189-996c-dce430bcc68f
spec:
podManagementPolicy: Parallel
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app: heron
topology: acking
serviceName: acking
template:
metadata:
annotations:
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: heron
topology: acking
spec:
containers:
- command:
- sh
- -c
- './heron-core/bin/heron-downloader-config kubernetes && ./heron-core/bin/heron-downloader
distributedlog://zookeeper:2181/heronbkdl/acking-saad-tag-0--1632273069134658892.tar.gz
. && SHARD_ID=$((${POD_NAME##*-} + 1)) && echo shardId=${SHARD_ID} && ./heron-core/bin/heron-executor
--topology-name=acking --topology-id=acking60a8ecb7-e031-4afc-9bff-8a18703aef3a
--topology-defn-file=acking.defn --state-manager-connection=zookeeper:2181
--state-manager-root=/heron --state-manager-config-file=./heron-conf/statemgr.yaml
--tmanager-binary=./heron-core/bin/heron-tmanager --stmgr-binary=./heron-core/bin/heron-stmgr
--metrics-manager-classpath=./heron-core/lib/metricsmgr/* --instance-jvm-opts="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg(61)(61)"
--classpath=heron-api-examples.jar --heron-internals-config-file=./heron-conf/heron_internals.yaml
--override-config-file=./heron-conf/override.yaml --component-ram-map=exclaim1:1073741824,word:1073741824
--component-jvm-opts="" --pkg-type=jar --topology-binary-file=heron-api-examples.jar
--heron-java-home=$JAVA_HOME --heron-shell-binary=./heron-core/bin/heron-shell
--cluster=kubernetes --role=saad --environment=default --instance-classpath=./heron-core/lib/instance/*
--metrics-sinks-config-file=./heron-conf/metrics_sinks.yaml --scheduler-classpath=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*
--python-instance-binary=./heron-core/bin/heron-python-instance --cpp-instance-binary=./heron-core/bin/heron-cpp-instance
--metricscache-manager-classpath=./heron-core/lib/metricscachemgr/* --metricscache-manager-mode=disabled
--is-stateful=false --checkpoint-manager-classpath=./heron-core/lib/ckptmgr/*:./heron-core/lib/statefulstorage/*:
--stateful-config-file=./heron-conf/stateful.yaml --checkpoint-manager-ram=1073741824
--health-manager-mode=disabled --health-manager-classpath=./heron-core/lib/healthmgr/*
--shard=$SHARD_ID --server-port=6001 --tmanager-controller-port=6002 --tmanager-stats-port=6003
--shell-port=6004 --metrics-manager-port=6005 --scheduler-port=6006 --metricscache-manager-server-port=6007
--metricscache-manager-stats-port=6008 --checkpoint-manager-port=6009'
env:
- name: HOST
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: var_one
value: variable one
- name: var_three
value: variable three
- name: var_two
value: variable two
image: apache/heron:testbuild
imagePullPolicy: IfNotPresent
name: executor
ports:
- containerPort: 5555
name: tcp-port-kept
protocol: TCP
- containerPort: 5556
name: udp-port-kept
protocol: UDP
- containerPort: 6001
name: server
protocol: TCP
- containerPort: 6002
name: tmanager-ctl
protocol: TCP
- containerPort: 6003
name: tmanager-stats
protocol: TCP
- containerPort: 6004
name: shell-port
protocol: TCP
- containerPort: 6005
name: metrics-mgr
protocol: TCP
- containerPort: 6006
name: scheduler
protocol: TCP
- containerPort: 6007
name: metrics-cache-m
protocol: TCP
- containerPort: 6008
name: metrics-cache-s
protocol: TCP
- containerPort: 6009
name: ckptmgr
protocol: TCP
resources:
limits:
cpu: "3"
memory: 4Gi
requests:
cpu: "3"
memory: 4Gi
securityContext:
allowPrivilegeEscalation: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: path/to/mount/dynamic/volume
name: executor-dynamic-volume
subPath: sub/path/to/mount/dynamic/volume
- mountPath: path/to/mount/shared/volume
name: executor-shared-volume
subPath: sub/path/to/mount/shared/volume
- mountPath: path/to/mount/static/volume
name: executor-static-volume
subPath: sub/path/to/mount/static/volume
- mountPath: /shared_volume
name: shared-volume
- image: alpine
imagePullPolicy: Always
name: sidecar-container
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /shared_volume
name: shared-volume
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 0
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 10
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 10
volumes:
- name: executor-shared-volume
persistentVolumeClaim:
claimName: requested-claim-by-user
- emptyDir: {}
name: shared-volume
updateStrategy:
rollingUpdate:
partition: 0
type: RollingUpdate
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
labels:
onDemand: "true"
topology: acking
name: executor-dynamic-volume
spec:
accessModes:
- ReadWriteOnce
- ReadOnlyMany
resources:
requests:
storage: 256Gi
volumeMode: Block
status:
phase: Pending
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
labels:
onDemand: "true"
topology: acking
name: executor-static-volume
spec:
accessModes:
- ReadWriteOnce
- ReadOnlyMany
resources:
requests:
storage: 512Gi
storageClassName: storage-class-name
volumeMode: Block
status:
phase: Pending
status:
collisionCount: 0
currentReplicas: 2
currentRevision: acking-executors-648bfd4494
observedGeneration: 1
replicas: 2
updateRevision: acking-executors-648bfd4494
updatedReplicas: 2 Manager StatefulSetapiVersion: apps/v1
kind: StatefulSet
metadata:
creationTimestamp: "2021-11-30T00:08:01Z"
generation: 1
labels:
app: heron
topology: acking
name: acking-manager
namespace: default
resourceVersion: "1637"
uid: 84f96cb2-093a-47d7-8882-98cf7833219d
spec:
podManagementPolicy: Parallel
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: heron
topology: acking
serviceName: acking
template:
metadata:
annotations:
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: heron
topology: acking
spec:
containers:
- command:
- sh
- -c
- './heron-core/bin/heron-downloader-config kubernetes && ./heron-core/bin/heron-downloader
distributedlog://zookeeper:2181/heronbkdl/acking-saad-tag-0--1632273069134658892.tar.gz
. && SHARD_ID=${POD_NAME##*-} && echo shardId=${SHARD_ID} && ./heron-core/bin/heron-executor
--topology-name=acking --topology-id=acking60a8ecb7-e031-4afc-9bff-8a18703aef3a
--topology-defn-file=acking.defn --state-manager-connection=zookeeper:2181
--state-manager-root=/heron --state-manager-config-file=./heron-conf/statemgr.yaml
--tmanager-binary=./heron-core/bin/heron-tmanager --stmgr-binary=./heron-core/bin/heron-stmgr
--metrics-manager-classpath=./heron-core/lib/metricsmgr/* --instance-jvm-opts="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg(61)(61)"
--classpath=heron-api-examples.jar --heron-internals-config-file=./heron-conf/heron_internals.yaml
--override-config-file=./heron-conf/override.yaml --component-ram-map=exclaim1:1073741824,word:1073741824
--component-jvm-opts="" --pkg-type=jar --topology-binary-file=heron-api-examples.jar
--heron-java-home=$JAVA_HOME --heron-shell-binary=./heron-core/bin/heron-shell
--cluster=kubernetes --role=saad --environment=default --instance-classpath=./heron-core/lib/instance/*
--metrics-sinks-config-file=./heron-conf/metrics_sinks.yaml --scheduler-classpath=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*
--python-instance-binary=./heron-core/bin/heron-python-instance --cpp-instance-binary=./heron-core/bin/heron-cpp-instance
--metricscache-manager-classpath=./heron-core/lib/metricscachemgr/* --metricscache-manager-mode=disabled
--is-stateful=false --checkpoint-manager-classpath=./heron-core/lib/ckptmgr/*:./heron-core/lib/statefulstorage/*:
--stateful-config-file=./heron-conf/stateful.yaml --checkpoint-manager-ram=1073741824
--health-manager-mode=disabled --health-manager-classpath=./heron-core/lib/healthmgr/*
--shard=$SHARD_ID --server-port=6001 --tmanager-controller-port=6002 --tmanager-stats-port=6003
--shell-port=6004 --metrics-manager-port=6005 --scheduler-port=6006 --metricscache-manager-server-port=6007
--metricscache-manager-stats-port=6008 --checkpoint-manager-port=6009'
env:
- name: HOST
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: var_one_manager
value: variable one on manager
- name: var_three_manager
value: variable three on manager
- name: var_two_manager
value: variable two on manager
image: apache/heron:testbuild
imagePullPolicy: IfNotPresent
name: manager
ports:
- containerPort: 6001
name: server
protocol: TCP
- containerPort: 6002
name: tmanager-ctl
protocol: TCP
- containerPort: 6003
name: tmanager-stats
protocol: TCP
- containerPort: 6004
name: shell-port
protocol: TCP
- containerPort: 6005
name: metrics-mgr
protocol: TCP
- containerPort: 6006
name: scheduler
protocol: TCP
- containerPort: 6007
name: metrics-cache-m
protocol: TCP
- containerPort: 6008
name: metrics-cache-s
protocol: TCP
- containerPort: 6009
name: ckptmgr
protocol: TCP
- containerPort: 7775
name: tcp-port-kept
protocol: TCP
- containerPort: 7776
name: udp-port-kept
protocol: UDP
resources:
limits:
cpu: "2"
memory: 3Gi
requests:
cpu: "1"
memory: 2Gi
securityContext:
allowPrivilegeEscalation: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: path/to/mount/dynamic/volume
name: manager-dynamic-volume
subPath: sub/path/to/mount/dynamic/volume
- mountPath: path/to/mount/shared/volume
name: manager-shared-volume
subPath: sub/path/to/mount/shared/volume
- mountPath: path/to/mount/static/volume
name: manager-static-volume
subPath: sub/path/to/mount/static/volume
- mountPath: /shared_volume/manager
name: shared-volume-manager
- image: alpine
imagePullPolicy: Always
name: manager-sidecar-container
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /shared_volume/manager
name: shared-volume-manager
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 0
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 10
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 10
volumes:
- name: manager-shared-volume
persistentVolumeClaim:
claimName: requested-claim-by-user
- emptyDir: {}
name: shared-volume-manager
updateStrategy:
rollingUpdate:
partition: 0
type: RollingUpdate
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
labels:
onDemand: "true"
topology: acking
name: manager-static-volume
spec:
accessModes:
- ReadWriteOnce
- ReadOnlyMany
resources:
requests:
storage: 512Gi
storageClassName: storage-class-name
volumeMode: Block
status:
phase: Pending
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
labels:
onDemand: "true"
topology: acking
name: manager-dynamic-volume
spec:
accessModes:
- ReadWriteOnce
- ReadOnlyMany
resources:
requests:
storage: 256Gi
volumeMode: Block
status:
phase: Pending
status:
collisionCount: 0
currentReplicas: 1
currentRevision: acking-manager-56cff7454d
observedGeneration: 1
replicas: 1
updateRevision: acking-manager-56cff7454d
updatedReplicas: 1 |
Adding in Javadocs I was too lazy to add for all the functions I touched.
Fixed typos for "topology's".
Updated logic to switch between manager and executor mode in <setShardIdEnvironmentVariableCommand>.
<getExecutorCommand> updated to for both executor and manager StatefulSets.
<createStatefulSet>: Updated name in metadata to <name-executors>. Reduced replica count by 1 to isolate Manager.
<getManagerLimits> collects limits by prefix.
<createStatefulSetManager>: Copy executor container. Set replica count to 1. Update StatefulSet name to <name-manager>. Update container name to <manager>. Update container command to set <shard_id>. Set <requests> to null. Limits are set if available.
<testCreateStatefulSetManager> test for copying and setup of Manager. TODO: limits testing. broken test.
Wired in logic to <Submit> to deploy manager.
<createTopologySelectorLabels> will generate labels of the form "app=heron,topology=topology-name".
Updated <deleteStatefulSets> to use a match label to find all of the topology's StatefulSets.
<getExecutorStatefulSetName> generates the name used by the Executor StatefulSet.
Updated to retrieve the Executor StatefulSet: <patchStatefulSetReplicas> <getStatefulSet>
Updated Helm chart for StatefulSet permissions: - deletecollection
<deleteStatefulSets> propagation policy not required.
Added labels to StatefulSet in <createStatefulSet>.
<createStatefulSetManager> using keys from constants for "CPU" and "MEMORY" for consistency.
<createStatefulSetManager> sets Requests equal to Limits when Limits are supplied via CLI.
Added Requests CLI command.
<createResourcesRequirement> will generate Requirements from configurations.
<createResourcesRequirement> will return null or only valid Resource maps.
Wired <createResourcesRequirement> into <createStatefulSetManager> to add Resources only when required.
<createStatefulSetManager> only adds Resources if they are valid and available.
<createStatefulSetManager> will set Requests to Limits if a Limit is provided without a Request.
<createStatefulSetManager> only adds Resources if they are valid and available.
--config-property heron.kubernetes.manager.limits.memory=3 |
Corrected typos and updated details in all sections.
Thank you @nicknezis and @windhamwong for taking the time to test and review that PR 😄. I have pushed some fixes to the documentation but the code remains unchanged.
You would need a large and diverse dataset to form a baseline for the default resource values. I feel we would need to solicit statistics from users for topologies with varying configurations (bolts and spouts) as well as data velocity and volume to form a baseline.
The reason I choose Megabytes is that we typically use Gigabytes for the units when working with memory (volatile and non-volatile) and we may need to work with fractions of a Gigabyte. I do not feel we would need the granularity of Kilobytes when working with memory and that it would make the command tedious to use (eg.: Edit: I think we are good to merge. |
Nice work @surahman . All, let's wait till 24 hours after @nicknezis approval before merging. |
So the bolts and spouts and strmgr processes all live in the executor pods. Data velocity and stuff like that is all in the executor. Manager just has a few processes that collect metrics and manage coordination of checkpointing (if used). Also if the physical plan changes it will coordinate changes through zookeeper. (It does more, but trying to give examples of the types of operations). I still think we can default to smaller value with low risk, but agree the risk is not 0%. After this is merged, I'll do some analysis on workloads at work to see what could be a good default. @windhamwong provided numbers also give me confidence because they match what I've observed. But I'll do a more rigorous process to capture numbers. But what I said above doesn't counter the decision to merge as is. I agree with @surahman decision. |
I agree. Once we are sure of the changes which need to be made we can go ahead and figure out how to affect this change in a future PR. I have one last set of typos corrections I am making to the documentation that I will merge in later today. |
Corrected typos and added details.
<createResourcesRequirement> is using unit suffixes from native K8s.
@nicknezis I have updated the This should fill @windhamwong's request for Kilobyte support via the Commands~/bin/heron submit kubernetes ~/.heron/examples/heron-api-examples.jar \
org.apache.heron.examples.api.AckingTopology acking \
--verbose \
--deploy-deactivated \
--config-property heron.kubernetes.executor.pod.template=pod-templ-executor.pod-template-executor.yaml \
--config-property heron.kubernetes.manager.pod.template=pod-templ-manager.pod-template-manager.yaml \
--config-property heron.kubernetes.manager.limits.cpu=2000m \
--config-property heron.kubernetes.manager.limits.memory=300Mi \
--config-property heron.kubernetes.manager.requests.cpu=1000m \
--config-property heron.kubernetes.manager.requests.memory=200Mi \
--config-property heron.kubernetes.executor.limits.cpu=5 \
--config-property heron.kubernetes.executor.limits.memory=6Gi \
--config-property heron.kubernetes.executor.requests.cpu=2 \
--config-property heron.kubernetes.executor.requests.memory=1Gi \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-dynamic-volume.claimName=OnDemand \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-dynamic-volume.accessModes=ReadWriteOnce,ReadOnlyMany \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-dynamic-volume.sizeLimit=256Gi \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-dynamic-volume.volumeMode=Block \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-dynamic-volume.path=path/to/mount/dynamic/volume \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-dynamic-volume.subPath=sub/path/to/mount/dynamic/volume \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-static-volume.claimName=OnDemand \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-static-volume.storageClassName=storage-class-name \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-static-volume.accessModes=ReadWriteOnce,ReadOnlyMany \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-static-volume.sizeLimit=512Gi \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-static-volume.volumeMode=Block \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-static-volume.path=path/to/mount/static/volume \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-static-volume.subPath=sub/path/to/mount/static/volume \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-shared-volume.claimName=requested-claim-by-user \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-shared-volume.path=path/to/mount/shared/volume \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-shared-volume.subPath=sub/path/to/mount/shared/volume \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-dynamic-volume.claimName=OnDemand \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-dynamic-volume.accessModes=ReadWriteOnce,ReadOnlyMany \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-dynamic-volume.sizeLimit=256Gi \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-dynamic-volume.volumeMode=Block \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-dynamic-volume.path=path/to/mount/dynamic/volume \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-dynamic-volume.subPath=sub/path/to/mount/dynamic/volume \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-static-volume.claimName=OnDemand \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-static-volume.storageClassName=storage-class-name \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-static-volume.accessModes=ReadWriteOnce,ReadOnlyMany \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-static-volume.sizeLimit=512Gi \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-static-volume.volumeMode=Block \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-static-volume.path=path/to/mount/static/volume \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-static-volume.subPath=sub/path/to/mount/static/volume \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-shared-volume.claimName=requested-claim-by-user \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-shared-volume.path=path/to/mount/shared/volume \
--config-property heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-shared-volume.subPath=sub/path/to/mount/shared/volume Manager StatefulSetapiVersion: apps/v1
kind: StatefulSet
metadata:
creationTimestamp: "2021-12-03T22:36:48Z"
generation: 1
labels:
app: heron
topology: acking
name: acking-manager
namespace: default
resourceVersion: "787"
uid: d93e7e8d-e690-4e72-96bd-2b327fff9ecc
spec:
podManagementPolicy: Parallel
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: heron
topology: acking
serviceName: acking
template:
metadata:
annotations:
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: heron
topology: acking
spec:
containers:
- command:
- sh
- -c
- './heron-core/bin/heron-downloader-config kubernetes && ./heron-core/bin/heron-downloader
distributedlog://zookeeper:2181/heronbkdl/acking-saad-tag-0-1268791470655715640.tar.gz
. && SHARD_ID=${POD_NAME##*-} && echo shardId=${SHARD_ID} && ./heron-core/bin/heron-executor
--topology-name=acking --topology-id=acking5d5d16b0-7b36-4662-9690-658afec32555
--topology-defn-file=acking.defn --state-manager-connection=zookeeper:2181
--state-manager-root=/heron --state-manager-config-file=./heron-conf/statemgr.yaml
--tmanager-binary=./heron-core/bin/heron-tmanager --stmgr-binary=./heron-core/bin/heron-stmgr
--metrics-manager-classpath=./heron-core/lib/metricsmgr/* --instance-jvm-opts="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg(61)(61)"
--classpath=heron-api-examples.jar --heron-internals-config-file=./heron-conf/heron_internals.yaml
--override-config-file=./heron-conf/override.yaml --component-ram-map=exclaim1:1073741824,word:1073741824
--component-jvm-opts="" --pkg-type=jar --topology-binary-file=heron-api-examples.jar
--heron-java-home=$JAVA_HOME --heron-shell-binary=./heron-core/bin/heron-shell
--cluster=kubernetes --role=saad --environment=default --instance-classpath=./heron-core/lib/instance/*
--metrics-sinks-config-file=./heron-conf/metrics_sinks.yaml --scheduler-classpath=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*
--python-instance-binary=./heron-core/bin/heron-python-instance --cpp-instance-binary=./heron-core/bin/heron-cpp-instance
--metricscache-manager-classpath=./heron-core/lib/metricscachemgr/* --metricscache-manager-mode=disabled
--is-stateful=false --checkpoint-manager-classpath=./heron-core/lib/ckptmgr/*:./heron-core/lib/statefulstorage/*:
--stateful-config-file=./heron-conf/stateful.yaml --checkpoint-manager-ram=1073741824
--health-manager-mode=disabled --health-manager-classpath=./heron-core/lib/healthmgr/*
--shard=$SHARD_ID --server-port=6001 --tmanager-controller-port=6002 --tmanager-stats-port=6003
--shell-port=6004 --metrics-manager-port=6005 --scheduler-port=6006 --metricscache-manager-server-port=6007
--metricscache-manager-stats-port=6008 --checkpoint-manager-port=6009'
env:
- name: HOST
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: var_one_manager
value: variable one on manager
- name: var_three_manager
value: variable three on manager
- name: var_two_manager
value: variable two on manager
image: apache/heron:testbuild
imagePullPolicy: IfNotPresent
name: manager
ports:
- containerPort: 6001
name: server
protocol: TCP
- containerPort: 6002
name: tmanager-ctl
protocol: TCP
- containerPort: 6003
name: tmanager-stats
protocol: TCP
- containerPort: 6004
name: shell-port
protocol: TCP
- containerPort: 6005
name: metrics-mgr
protocol: TCP
- containerPort: 6006
name: scheduler
protocol: TCP
- containerPort: 6007
name: metrics-cache-m
protocol: TCP
- containerPort: 6008
name: metrics-cache-s
protocol: TCP
- containerPort: 6009
name: ckptmgr
protocol: TCP
- containerPort: 7775
name: tcp-port-kept
protocol: TCP
- containerPort: 7776
name: udp-port-kept
protocol: UDP
resources:
limits:
cpu: "2"
memory: 300Mi
requests:
cpu: "1"
memory: 200Mi
securityContext:
allowPrivilegeEscalation: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: path/to/mount/dynamic/volume
name: manager-dynamic-volume
subPath: sub/path/to/mount/dynamic/volume
- mountPath: path/to/mount/shared/volume
name: manager-shared-volume
subPath: sub/path/to/mount/shared/volume
- mountPath: path/to/mount/static/volume
name: manager-static-volume
subPath: sub/path/to/mount/static/volume
- mountPath: /shared_volume/manager
name: shared-volume-manager
- image: alpine
imagePullPolicy: Always
name: manager-sidecar-container
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /shared_volume/manager
name: shared-volume-manager
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 0
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 10
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 10
volumes:
- name: manager-shared-volume
persistentVolumeClaim:
claimName: requested-claim-by-user
- emptyDir: {}
name: shared-volume-manager
updateStrategy:
rollingUpdate:
partition: 0
type: RollingUpdate
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
labels:
onDemand: "true"
topology: acking
name: manager-static-volume
spec:
accessModes:
- ReadWriteOnce
- ReadOnlyMany
resources:
requests:
storage: 512Gi
storageClassName: storage-class-name
volumeMode: Block
status:
phase: Pending
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
labels:
onDemand: "true"
topology: acking
name: manager-dynamic-volume
spec:
accessModes:
- ReadWriteOnce
- ReadOnlyMany
resources:
requests:
storage: 256Gi
volumeMode: Block
status:
phase: Pending
status:
collisionCount: 0
currentReplicas: 1
currentRevision: acking-manager-5f576f75cc
observedGeneration: 1
replicas: 1
updateRevision: acking-manager-5f576f75cc
updatedReplicas: 1 Executor StatefulSetapiVersion: apps/v1
kind: StatefulSet
metadata:
creationTimestamp: "2021-12-03T22:36:48Z"
generation: 1
labels:
app: heron
topology: acking
name: acking-executors
namespace: default
resourceVersion: "789"
uid: ce141b3b-b7f6-43ba-8442-57c63b528be3
spec:
podManagementPolicy: Parallel
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app: heron
topology: acking
serviceName: acking
template:
metadata:
annotations:
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: heron
topology: acking
spec:
containers:
- command:
- sh
- -c
- './heron-core/bin/heron-downloader-config kubernetes && ./heron-core/bin/heron-downloader
distributedlog://zookeeper:2181/heronbkdl/acking-saad-tag-0-1268791470655715640.tar.gz
. && SHARD_ID=$((${POD_NAME##*-} + 1)) && echo shardId=${SHARD_ID} && ./heron-core/bin/heron-executor
--topology-name=acking --topology-id=acking5d5d16b0-7b36-4662-9690-658afec32555
--topology-defn-file=acking.defn --state-manager-connection=zookeeper:2181
--state-manager-root=/heron --state-manager-config-file=./heron-conf/statemgr.yaml
--tmanager-binary=./heron-core/bin/heron-tmanager --stmgr-binary=./heron-core/bin/heron-stmgr
--metrics-manager-classpath=./heron-core/lib/metricsmgr/* --instance-jvm-opts="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg(61)(61)"
--classpath=heron-api-examples.jar --heron-internals-config-file=./heron-conf/heron_internals.yaml
--override-config-file=./heron-conf/override.yaml --component-ram-map=exclaim1:1073741824,word:1073741824
--component-jvm-opts="" --pkg-type=jar --topology-binary-file=heron-api-examples.jar
--heron-java-home=$JAVA_HOME --heron-shell-binary=./heron-core/bin/heron-shell
--cluster=kubernetes --role=saad --environment=default --instance-classpath=./heron-core/lib/instance/*
--metrics-sinks-config-file=./heron-conf/metrics_sinks.yaml --scheduler-classpath=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*
--python-instance-binary=./heron-core/bin/heron-python-instance --cpp-instance-binary=./heron-core/bin/heron-cpp-instance
--metricscache-manager-classpath=./heron-core/lib/metricscachemgr/* --metricscache-manager-mode=disabled
--is-stateful=false --checkpoint-manager-classpath=./heron-core/lib/ckptmgr/*:./heron-core/lib/statefulstorage/*:
--stateful-config-file=./heron-conf/stateful.yaml --checkpoint-manager-ram=1073741824
--health-manager-mode=disabled --health-manager-classpath=./heron-core/lib/healthmgr/*
--shard=$SHARD_ID --server-port=6001 --tmanager-controller-port=6002 --tmanager-stats-port=6003
--shell-port=6004 --metrics-manager-port=6005 --scheduler-port=6006 --metricscache-manager-server-port=6007
--metricscache-manager-stats-port=6008 --checkpoint-manager-port=6009'
env:
- name: HOST
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: var_one
value: variable one
- name: var_three
value: variable three
- name: var_two
value: variable two
image: apache/heron:testbuild
imagePullPolicy: IfNotPresent
name: executor
ports:
- containerPort: 5555
name: tcp-port-kept
protocol: TCP
- containerPort: 5556
name: udp-port-kept
protocol: UDP
- containerPort: 6001
name: server
protocol: TCP
- containerPort: 6002
name: tmanager-ctl
protocol: TCP
- containerPort: 6003
name: tmanager-stats
protocol: TCP
- containerPort: 6004
name: shell-port
protocol: TCP
- containerPort: 6005
name: metrics-mgr
protocol: TCP
- containerPort: 6006
name: scheduler
protocol: TCP
- containerPort: 6007
name: metrics-cache-m
protocol: TCP
- containerPort: 6008
name: metrics-cache-s
protocol: TCP
- containerPort: 6009
name: ckptmgr
protocol: TCP
resources:
limits:
cpu: "5"
memory: 6Gi
requests:
cpu: "2"
memory: 1Gi
securityContext:
allowPrivilegeEscalation: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: path/to/mount/dynamic/volume
name: executor-dynamic-volume
subPath: sub/path/to/mount/dynamic/volume
- mountPath: path/to/mount/shared/volume
name: executor-shared-volume
subPath: sub/path/to/mount/shared/volume
- mountPath: path/to/mount/static/volume
name: executor-static-volume
subPath: sub/path/to/mount/static/volume
- mountPath: /shared_volume
name: shared-volume
- image: alpine
imagePullPolicy: Always
name: sidecar-container
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /shared_volume
name: shared-volume
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 0
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 10
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 10
volumes:
- name: executor-shared-volume
persistentVolumeClaim:
claimName: requested-claim-by-user
- emptyDir: {}
name: shared-volume
updateStrategy:
rollingUpdate:
partition: 0
type: RollingUpdate
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
labels:
onDemand: "true"
topology: acking
name: executor-dynamic-volume
spec:
accessModes:
- ReadWriteOnce
- ReadOnlyMany
resources:
requests:
storage: 256Gi
volumeMode: Block
status:
phase: Pending
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
labels:
onDemand: "true"
topology: acking
name: executor-static-volume
spec:
accessModes:
- ReadWriteOnce
- ReadOnlyMany
resources:
requests:
storage: 512Gi
storageClassName: storage-class-name
volumeMode: Block
status:
phase: Pending
status:
collisionCount: 0
currentReplicas: 2
currentRevision: acking-executors-675c888b5
observedGeneration: 1
replicas: 2
updateRevision: acking-executors-675c888b5
updatedReplicas: 2 |
Thanks. Let me test out over the weekend. |
I ran a test with |
Thank you, Nick, let us give Windham some time to test this out before merging. |
Im testing on the build, and wondering the needs of |
I don't have problem with the PVC but just wondering, do we need
as we already got
in each topology. Will there be a way to implement the config in topology constants so we can change the value for specific topology? |
I think we got another bug here. As the Python version using is 3.8 under your pr branch, it warns the Python library kazoo (used in zookeeper connection). Heron Tracker uses kazoo 2.7.0 but Python 3.8 is not compatible with it and has to upgrade kazoo to 2.8.0.
Shall have another PR for this :D |
@windhamwong It is not required, and if not supplied a default
This functionality mirrors what is available in Spark and permits deployment time tweaking of resources without having to repackage a
Good catch, please open a new issue if you have not already. These are not changes that were introduced in this PR and are most likely associated with updates to facilitate building on |
Please double check your commands and script. I see a trailing
Here is my run on ~/bin/heron submit kubernetes ~/.heron/examples/heron-api-examples.jar \
org.apache.heron.examples.api.AckingTopology acking \
--verbose \
--config-property heron.kubernetes.manager.limits.cpu=300m \
--config-property heron.kubernetes.manager.limits.memory=300Mi \
--config-property heron.kubernetes.manager.requests.cpu=20m \
--config-property heron.kubernetes.manager.requests.memory=100Mi Manager StatefulSetapiVersion: apps/v1
kind: StatefulSet
metadata:
creationTimestamp: "2021-12-05T16:12:40Z"
generation: 1
labels:
app: heron
topology: acking
name: acking-manager
namespace: default
resourceVersion: "858"
uid: 6be69dad-2943-4813-9c26-7ed5e185a0e1
spec:
podManagementPolicy: Parallel
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: heron
topology: acking
serviceName: acking
template:
metadata:
annotations:
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: heron
topology: acking
spec:
containers:
- command:
- sh
- -c
- './heron-core/bin/heron-downloader-config kubernetes && ./heron-core/bin/heron-downloader
distributedlog://zookeeper:2181/heronbkdl/acking-saad-tag-0--4532610184000198972.tar.gz
. && SHARD_ID=${POD_NAME##*-} && echo shardId=${SHARD_ID} && ./heron-core/bin/heron-executor
--topology-name=acking --topology-id=acking2d4d1f63-90db-435d-9e2b-6be7f5bfc0ee
--topology-defn-file=acking.defn --state-manager-connection=zookeeper:2181
--state-manager-root=/heron --state-manager-config-file=./heron-conf/statemgr.yaml
--tmanager-binary=./heron-core/bin/heron-tmanager --stmgr-binary=./heron-core/bin/heron-stmgr
--metrics-manager-classpath=./heron-core/lib/metricsmgr/* --instance-jvm-opts="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg(61)(61)"
--classpath=heron-api-examples.jar --heron-internals-config-file=./heron-conf/heron_internals.yaml
--override-config-file=./heron-conf/override.yaml --component-ram-map=exclaim1:1073741824,word:1073741824
--component-jvm-opts="" --pkg-type=jar --topology-binary-file=heron-api-examples.jar
--heron-java-home=$JAVA_HOME --heron-shell-binary=./heron-core/bin/heron-shell
--cluster=kubernetes --role=saad --environment=default --instance-classpath=./heron-core/lib/instance/*
--metrics-sinks-config-file=./heron-conf/metrics_sinks.yaml --scheduler-classpath=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*
--python-instance-binary=./heron-core/bin/heron-python-instance --cpp-instance-binary=./heron-core/bin/heron-cpp-instance
--metricscache-manager-classpath=./heron-core/lib/metricscachemgr/* --metricscache-manager-mode=disabled
--is-stateful=false --checkpoint-manager-classpath=./heron-core/lib/ckptmgr/*:./heron-core/lib/statefulstorage/*:
--stateful-config-file=./heron-conf/stateful.yaml --checkpoint-manager-ram=1073741824
--health-manager-mode=disabled --health-manager-classpath=./heron-core/lib/healthmgr/*
--shard=$SHARD_ID --server-port=6001 --tmanager-controller-port=6002 --tmanager-stats-port=6003
--shell-port=6004 --metrics-manager-port=6005 --scheduler-port=6006 --metricscache-manager-server-port=6007
--metricscache-manager-stats-port=6008 --checkpoint-manager-port=6009'
env:
- name: HOST
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
image: apache/heron:testbuild
imagePullPolicy: IfNotPresent
name: manager
ports:
- containerPort: 6003
name: tmanager-stats
protocol: TCP
- containerPort: 6007
name: metrics-cache-m
protocol: TCP
- containerPort: 6004
name: shell-port
protocol: TCP
- containerPort: 6001
name: server
protocol: TCP
- containerPort: 6002
name: tmanager-ctl
protocol: TCP
- containerPort: 6009
name: ckptmgr
protocol: TCP
- containerPort: 6006
name: scheduler
protocol: TCP
- containerPort: 6005
name: metrics-mgr
protocol: TCP
- containerPort: 6008
name: metrics-cache-s
protocol: TCP
resources:
limits:
cpu: 300m
memory: 300Mi
requests:
cpu: 20m
memory: 100Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 0
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 10
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 10
updateStrategy:
rollingUpdate:
partition: 0
type: RollingUpdate
status:
availableReplicas: 1
collisionCount: 0
currentReplicas: 1
currentRevision: acking-manager-f5f4784
observedGeneration: 1
readyReplicas: 1
replicas: 1
updateRevision: acking-manager-f5f4784
updatedReplicas: 1 Executor StatefulSetapiVersion: apps/v1
kind: StatefulSet
metadata:
creationTimestamp: "2021-12-05T16:12:40Z"
generation: 1
labels:
app: heron
topology: acking
name: acking-executors
namespace: default
resourceVersion: "862"
uid: afe8b301-953d-49fa-af05-37289f9cf721
spec:
podManagementPolicy: Parallel
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app: heron
topology: acking
serviceName: acking
template:
metadata:
annotations:
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: heron
topology: acking
spec:
containers:
- command:
- sh
- -c
- './heron-core/bin/heron-downloader-config kubernetes && ./heron-core/bin/heron-downloader
distributedlog://zookeeper:2181/heronbkdl/acking-saad-tag-0--4532610184000198972.tar.gz
. && SHARD_ID=$((${POD_NAME##*-} + 1)) && echo shardId=${SHARD_ID} && ./heron-core/bin/heron-executor
--topology-name=acking --topology-id=acking2d4d1f63-90db-435d-9e2b-6be7f5bfc0ee
--topology-defn-file=acking.defn --state-manager-connection=zookeeper:2181
--state-manager-root=/heron --state-manager-config-file=./heron-conf/statemgr.yaml
--tmanager-binary=./heron-core/bin/heron-tmanager --stmgr-binary=./heron-core/bin/heron-stmgr
--metrics-manager-classpath=./heron-core/lib/metricsmgr/* --instance-jvm-opts="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg(61)(61)"
--classpath=heron-api-examples.jar --heron-internals-config-file=./heron-conf/heron_internals.yaml
--override-config-file=./heron-conf/override.yaml --component-ram-map=exclaim1:1073741824,word:1073741824
--component-jvm-opts="" --pkg-type=jar --topology-binary-file=heron-api-examples.jar
--heron-java-home=$JAVA_HOME --heron-shell-binary=./heron-core/bin/heron-shell
--cluster=kubernetes --role=saad --environment=default --instance-classpath=./heron-core/lib/instance/*
--metrics-sinks-config-file=./heron-conf/metrics_sinks.yaml --scheduler-classpath=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*
--python-instance-binary=./heron-core/bin/heron-python-instance --cpp-instance-binary=./heron-core/bin/heron-cpp-instance
--metricscache-manager-classpath=./heron-core/lib/metricscachemgr/* --metricscache-manager-mode=disabled
--is-stateful=false --checkpoint-manager-classpath=./heron-core/lib/ckptmgr/*:./heron-core/lib/statefulstorage/*:
--stateful-config-file=./heron-conf/stateful.yaml --checkpoint-manager-ram=1073741824
--health-manager-mode=disabled --health-manager-classpath=./heron-core/lib/healthmgr/*
--shard=$SHARD_ID --server-port=6001 --tmanager-controller-port=6002 --tmanager-stats-port=6003
--shell-port=6004 --metrics-manager-port=6005 --scheduler-port=6006 --metricscache-manager-server-port=6007
--metricscache-manager-stats-port=6008 --checkpoint-manager-port=6009'
env:
- name: HOST
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
image: apache/heron:testbuild
imagePullPolicy: IfNotPresent
name: executor
ports:
- containerPort: 6003
name: tmanager-stats
protocol: TCP
- containerPort: 6007
name: metrics-cache-m
protocol: TCP
- containerPort: 6004
name: shell-port
protocol: TCP
- containerPort: 6001
name: server
protocol: TCP
- containerPort: 6002
name: tmanager-ctl
protocol: TCP
- containerPort: 6009
name: ckptmgr
protocol: TCP
- containerPort: 6006
name: scheduler
protocol: TCP
- containerPort: 6005
name: metrics-mgr
protocol: TCP
- containerPort: 6008
name: metrics-cache-s
protocol: TCP
resources:
limits:
cpu: "3"
memory: 4Gi
requests:
cpu: "3"
memory: 4Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 0
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 10
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 10
updateStrategy:
rollingUpdate:
partition: 0
type: RollingUpdate
status:
availableReplicas: 2
collisionCount: 0
currentReplicas: 2
currentRevision: acking-executors-548c6dbd6c
observedGeneration: 1
readyReplicas: 2
replicas: 2
updateRevision: acking-executors-548c6dbd6c
updatedReplicas: 2 |
my test shows successful with the correct manager resource requests/limits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test shows successful with provided manager resource requests and limits.
@windhamwong thank you for also taking the time to review and test! 😄 |
Alright. Let's get this merged, @surahman . Nice work 💯 |
Feature #3724: Management/Driver pod should not use the same resources as the other topology pods. Should create separate deployment/service.
This PR builds upon #3725 in order to test the functionality of Volume Claim configurations in the
Manager
. Once #3725 is merged intomaster
I will hard-reset the feature branch for this PR ontomaster
(this may temporarily close the PR). After that, I will merge thedev
branch with the feature branch again and resolve any merge conflicts which may arise.The following are the current features and I am soliciting input across all areas:
StatefulSet
s are named[topology-name]-manager
and[topology-name]-executors
.Service
is used.Manager
is a duplicate of theExecutor
StatefulSet
.StatefulSet
s, theService
, and allVolume Claims
which are generated for the topology are removed on termination.restart
will restart theManager
andExecutor
s for a topology.addContainers
will only addExecutor
containers.removeContainers
will only removeExecutor
containers.patchStatefulSetReplicas
will only patchExecutor
containers.Usage
The command pattern is as follows:
heron.kubernetes.manager.[limits | requests].[OPTION]=[VALUE]
The currently supported CLI
options
are:cpu
memory
cpu
must be natural number andmemory
must be a positive decimal indicating a value in Gigabytes.Example:
Manager StatefulSet
Executor StatefulSet