id | title | sidebar_label |
---|---|---|
schedulers-k8s-execution-environment |
Kubernetes Execution Environment Customization |
Kubernetes Execution Environment Customization |
This document demonstrates how you can customize various aspects of the Heron execution environment when using the Kubernetes Scheduler.
Table of contents:
- Customizing the Heron Execution Environment
This section demonstrates how you can utilize custom Pod Templates embedded in Configuration Maps for your Topology's
Executor
s andManager
(hereinafter referred to asHeron containers
). You may specify different Pod Templates for different topologies.
When you deploy a topology to Heron on Kubernetes, you may specify individual Pod Templates to be used in your topology's Executor
s and Manager
. This can be achieved by providing valid Pod Templates, and embedding the Pod Templates in Configuration Maps. By default, Heron will use a minimally configured Pod Template which is adequate to deploy a topology.
Pod Templates will allow you to configure most aspects of your topology's execution environment, with some exceptions. There are some aspects of Pods for which Heron will have the final say, and which will not be user-customizable. Please view the tables at the end of this section to identify what is set by Heron.
System Administrators:
- You may wish to disable the ability to load custom Pod Templates. To achieve this, you must pass the define option
-D heron.kubernetes.pod.template.disabled=true
to the Heron API Server on the command line when launching. This command has been added to the Kubernetes configuration files to deploy the Heron API Server and is set tofalse
by default.- If you have a custom
Role
for the Heron API Server you will need to ensure theServiceAccount
attached to the API server, via aRoleBinding
, has the correct permissions to access theConfigMaps
:rules: - apiGroups: - "" resources: - configmaps verbs: - get - list
To deploy a custom Pod Template to Kubernetes with your topology, you must provide a valid Pod Template embedded in a valid Configuration Map. We will be using the following variables throughout this document, some of which are reserved variable names:
POD-TEMPLATE-NAME
: This is the name of the Pod Template's YAML definition file. This is not a reserved variable and is a place-holder name.CONFIG-MAP-NAME
: This is the name that will be used by the Configuration Map in which the Pod Template will be embedded bykubectl
. This is not a reserved variable and is a place-holder name.heron.kubernetes.[executor | manager].pod.template
: This variable name is used as the key passed to Heron for the--config-property
on the CLI. This is a reserved variable name.
NOTE: Please do not use the .
(period character) in the name of the CONFIG-MAP-NAME
. This character will be used as a delimiter when submitting your topologies.
It is highly advised that you validate your Pod Templates before placing them in a ConfigMap
to isolate any validity issues using a tool such as Kubeval or the built-in dry-run
functionality in Kubernetes. Whilst these tools are handy, they will not catch all potential errors in Kubernetes configurations.
NOTE: When submitting a Pod Template to customize an Executor
or Manager
, Heron will look for containers named executor
and manager
respectively. These containers will be modified to support the functioning of Heron, please read further below.
An example of the Pod Template format is provided below, and is derived from the configuration for the Heron Tracker Pod:
apiVersion: v1
kind: PodTemplate
metadata:
name: heron-tracker
namespace: default
template:
metadata:
labels:
app: heron-tracker
spec:
containers:
- name: heron-tracker
image: apache/heron:latest
ports:
- containerPort: 8888
name: api-port
resources:
requests:
cpu: "100m"
memory: "200M"
limits:
cpu: "400m"
memory: "512M"
You would need to save this file as POD-TEMPLATE-NAME
. Once you have a valid Pod Template you may proceed to generate a ConfigMap
.
You must place the
ConfigMap
in the same namespace as the Heron API Server using the--namespace
option in the commands below if the API Server is not in thedefault
namespace.
To generate a ConfigMap
you will need to run the following command:
kubectl create configmap CONFIG-MAP-NAME --from-file path/to/POD-TEMPLATE-NAME
You may then want to verify the contents of the ConfigMap
by running the following command:
kubectl get configmaps CONFIG-MAP-NAME -o yaml
The ConfigMap
should appear similar to the one below for our example:
apiVersion: v1
data:
POD-TEMPLATE-NAME: |
apiVersion: v1
kind: PodTemplate
metadata:
name: heron-tracker
namespace: default
template:
metadata:
labels:
app: heron-tracker
spec:
containers:
- name: heron-tracker
image: apache/heron:latest
ports:
- containerPort: 8888
name: api-port
resources:
requests:
cpu: "100m"
memory: "200M"
limits:
cpu: "400m"
memory: "512M"
kind: ConfigMap
metadata:
creationTimestamp: "2021-09-27T21:55:30Z"
name: CONFIG-MAP-NAME
namespace: default
resourceVersion: "1313"
uid: ba001653-03d9-4ac8-804c-d2c55c974281
To use the ConfigMap
for a topology you would will need to submit with the additional flag --confg-property
. The --config-property key=value
takes a key-value pair:
- Key:
heron.kubernetes.[executor | manager].pod.template
- Value:
CONFIG-MAP-NAME.POD-TEMPLATE-NAME
Please note that you must concatenate CONFIG-MAP-NAME
and POD-TEMPLATE-NAME
with a .
(period character).
For example:
heron submit kubernetes \
--service-url=http://localhost:8001/api/v1/namespaces/default/services/heron-apiserver:9000/proxy \
~/.heron/examples/heron-api-examples.jar \
org.apache.heron.examples.api.AckingTopology acking \
--config-property heron.kubernetes.executor.pod.template=CONFIG-MAP-NAME.POD-TEMPLATE-NAME \
--config-property heron.kubernetes.manager.pod.template=CONFIG-MAP-NAME.POD-TEMPLATE-NAME
Heron will locate the containers named executor
and/or manager
in the Pod Template and customize them as outlined below. All other containers within the Pod Templates will remain unchanged.
All metadata for the Heron containers
will be overwritten by Heron. In some other cases, values from the Pod Template for the executor
and manager
will be overwritten by Heron as outlined below.
Name | Description | Policy |
---|---|---|
image |
The Heron container 's image. |
Overwritten by Heron using values from the config. |
env |
Environment variables are made available within the container. The HOST and POD_NAME keys are required by Heron and are thus reserved. |
Merged with Heron's values taking precedence. Deduplication is based on name . |
ports |
Port numbers opened within the container. Some of these port numbers are required by Heron and are thus reserved. The reserved ports are defined in Heron's constants as [6001 -6010 ]. |
Merged with Heron's values taking precedence. Deduplication is based on the containerPort value. |
limits requests |
Heron will attempt to load values for cpu and memory from configs. |
Heron's values take precedence over those in the Pod Templates. |
volumeMounts |
These are the mount points within the Heron container for the volumes available in the Pod. |
Merged with Heron's values taking precedence. Deduplication is based on the name value. |
Annotation: prometheus.io/scrape |
Flag to indicate whether Prometheus logs can be scraped and is set to true . |
Value is overridden by Heron. |
Annotation prometheus.io/port |
Port address for Prometheus log scraping and is set to 8080 . |
Values are overridden by Heron. |
Annotation: Pod | Pod's revision/version hash. | Automatically set. |
Annotation: Service | Labels services can use to attach to the Pod. | Automatically set. |
Label: app |
Name of the application launching the Pod and is set to Heron . |
Values are overridden by Heron. |
Label: topology |
The name of topology which was provided when submitting. | User-defined and supplied on the CLI. |
The following items will be set in the Pod Template's spec
by Heron.
Name | Description | Policy |
---|---|---|
terminationGracePeriodSeconds |
Grace period to wait before shutting down the Pod after a SIGTERM signal and is set to 0 seconds. |
Values are overridden by Heron. |
tolerations |
Attempts to schedule Pods with taints onto nodes hosting Pods with matching taints . The entries below are included by default. Keys: node.kubernetes.io/not-ready node.kubernetes.io/unreachable Values (common): operator: Exists effect: NoExecute tolerationSeconds: 10L |
Merged with Heron's values taking precedence. Deduplication is based on the key value. |
containers |
Configurations for containers to be launched within the Pod. | All containers, excluding the Heron container s, are loaded as-is. |
volumes |
Volumes to be made available to the entire Pod. | Merged with Heron's values taking precedence. Deduplication is based on the name value. |
secretVolumes |
Secrets to be mounted as volumes within the Pod. | Loaded from the Heron configs if present. |
This section demonstrates how you can utilize both static and dynamically backed Persistent Volume Claims in the
Executor
andManager
containers (hereinafter referred to asHeron containers
). You will need to enable Dynamic Provisioning in your Kubernetes cluster to proceed to use the dynamic provisioning functionality.
It is possible to leverage Persistent Volumes with custom Pod Templates but the Volumes you add will be shared between all Executor
Pods in the topology when customizing the Executor
s.
The CLI commands allow you to configure a Persistent Volume Claim (dynamically or statically backed) which will be unique and isolated to each Pod and mounted in a single Heron container
when you submit your topology with a claim name of OnDemand
. Using any claim name other than on OnDemand
will permit you to configure a shared Persistent Volume without a custom Pod Template which will be shared between all Executor
Pods when customizing them. The CLI commands override any configurations you may have present in the Pod Template, but Heron's configurations will take precedence over all others.
Some use cases include process checkpointing, caching of results for later use in the process, intermediate results which could prove useful in analysis (ETL/ELT to a data lake or warehouse), as a source of data enrichment, etc.
Note: Heron will remove any dynamically backed Persistent Volume Claims it creates when a topology is terminated. Please be aware that Heron uses the following Labels
to locate the claims it has created:
metadata:
labels:
topology: <topology-name>
onDemand: true
System Administrators:
- You may wish to disable the ability to configure Persistent Volume Claims specified via the CLI. To achieve this, you must pass the define option
-D heron.kubernetes.persistent.volume.claims.cli.disabled=true
to the Heron API Server on the command line when launching. This command has been added to the Kubernetes configuration files to deploy the Heron API Server and is set tofalse
by default.- If you have a custom
Role
/ClusterRole
for the Heron API Server you will need to ensure theServiceAccount
attached to the API server has the correct permissions to access thePersistent Volume Claim
s:rules: - apiGroups: - "" resources: - persistentvolumeclaims verbs: - create - delete - get - list - deletecollection
To configure a Persistent Volume Claim you must use the --config-property
option with the heron.kubernetes.[executor | manager].volumes.persistentVolumeClaim.
command prefix. Heron will not validate your Persistent Volume Claim configurations, so please validate them to ensure they are well-formed. All names must comply with the lowercase RFC-1123 standard.
The command pattern is as follows:
heron.kubernetes.[executor | manager].volumes.persistentVolumeClaim.[VOLUME NAME].[OPTION]=[VALUE]
The currently supported CLI options
are:
claimName
storageClass
sizeLimit
accessModes
volumeMode
path
subPath
Note: A claimName
of OnDemand
will create unique Volumes for each Heron container
as well as deploy a Persistent Volume Claim for each Volume. Any other claim name will result in a shared Volume being created between all Pods in the topology.
Note: The accessModes
must be a comma-separated list of values without any white space. Valid values can be found in the Kubernetes documentation.
Note: If a storageClassName
is specified and there are no matching Persistent Volumes then dynamic provisioning must be enabled. Kubernetes will attempt to locate a Persistent Volume that matches the storageClassName
before it attempts to use dynamic provisioning. If a storageClassName
is not specified there must be Persistent Volumes provisioned manually with the storageClassName
of standard
.
A series of example commands to add Persistent Volumes
to Executor
s, and the YAML
entries they make in their respective configurations, are as follows.
Dynamic:
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.claimName=OnDemand
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.storageClassName=storage-class-name-of-choice
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.accessModes=comma,separated,list
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.sizeLimit=555Gi
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.volumeMode=volume-mode-of-choice
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.path=/path/to/mount
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.subPath=/sub/path/to/mount
Generated Persistent Volume Claim
:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
labels:
app: heron
onDemand: "true"
topology: <topology-name>
name: volumenameofchoice-<topology-name>-[Ordinal]
spec:
accessModes:
- comma
- separated
- list
resources:
requests:
storage: 555Gi
storageClassName: storage-class-name-of-choice
volumeMode: volume-mode-of-choice
Pod Spec entries for Volume
:
volumes:
- name: volumenameofchoice
persistentVolumeClaim:
claimName: volumenameofchoice-<topology-name>-[Ordinal]
Executor
container entries for Volume Mounts
:
volumeMounts:
- mountPath: /path/to/mount
subPath: /sub/path/to/mount
name: volumenameofchoice
Static:
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.claimName=OnDemand
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.accessModes=comma,separated,list
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.sizeLimit=555Gi
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.volumeMode=volume-mode-of-choice
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.path=/path/to/mount
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.subPath=/sub/path/to/mount
Generated Persistent Volume Claim
:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
labels:
app: heron
onDemand: "true"
topology: <topology-name>
name: volumenameofchoice-<topology-name>-[Ordinal]
spec:
accessModes:
- comma
- separated
- list
resources:
requests:
storage: 555Gi
storageClassName: standard
volumeMode: volume-mode-of-choice
Pod Spec entries for Volume
:
volumes:
- name: volumenameofchoice
persistentVolumeClaim:
claimName: volumenameofchoice-<topology-name>-[Ordinal]
Executor
container entries for Volume Mounts
:
volumeMounts:
- mountPath: /path/to/mount
subPath: /sub/path/to/mount
name: volumenameofchoice
A series of example commands to sumbit a topology using the dynamic example CLI commands above:
heron submit kubernetes \
--service-url=http://localhost:8001/api/v1/namespaces/default/services/heron-apiserver:9000/proxy \
~/.heron/examples/heron-api-examples.jar \
org.apache.heron.examples.api.AckingTopology acking \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.claimName=OnDemand \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.storageClassName=storage-class-name-of-choice \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.accessModes=comma,separated,list \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.sizeLimit=555Gi \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.volumeMode=volume-mode-of-choice \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.path=/path/to/mount \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.subPath=/sub/path/to/mount
The following table outlines CLI options which are either required ( ✅ ), optional ( ❔ ), or not available ( ❌ ) depending on if you are using dynamically/statically backed or shared Volume
s.
Option | Dynamic | Static | Shared |
---|---|---|---|
VOLUME NAME |
✅ | ✅ | ✅ |
claimName |
OnDemand |
OnDemand |
A valid name |
path |
✅ | ✅ | ✅ |
subPath |
❔ | ❔ | ❔ |
storageClassName |
✅ | ❌ | ❌ |
accessModes |
✅ | ✅ | ❌ |
sizeLimit |
❔ | ❔ | ❌ |
volumeMode |
❔ | ❔ | ❌ |
Note: The VOLUME NAME
will be extracted from the CLI command and a claimName
is a always required.
The configuration items and entries in the tables below will made in their respective areas.
A Volume
and a Volume Mount
will be created for each volume name
which you specify. Additionally, one Persistent Volume Claim
will be created for each Volume
specified as dynamic using the OnDemand
claim name.
Name | Description | Policy |
---|---|---|
VOLUME NAME |
The name of the Volume . |
Entries made in the Persistent Volume Claim 's spec, the Pod Spec's Volumes , and the Heron containers volumeMounts . |
claimName |
A Claim name for the Persistent Volume. | If OnDemand is provided as the parameter then a unique Volume and Persistent Volume Claim will be created. Any other name will result in a shared Volume between all Pods in the topology with only a Volume and Volume Mount being added. |
path |
The mountPath of the Volume . |
Entries made in the Heron containers volumeMounts . |
subPath |
The subPath of the Volume . |
Entries made in the Heron containers volumeMounts . |
storageClassName |
The identifier name used to reference the dynamic StorageClass . |
Entries made in the Persistent Volume Claim and Pod Spec's Volume . |
accessModes |
A comma-separated list of access modes. | Entries made in the Persistent Volume Claim . |
sizeLimit |
A resource request for storage space units. | Entries made in the Persistent Volume Claim . |
volumeMode |
Either FileSystem (default) or Block (raw block). Read more. |
Entries made in the Persistent Volume Claim . |
Labels | Two labels for topology and onDemand provisioning are added. |
These labels are only added to dynamically backed Persistent Volume Claim s created by Heron to support the removal of any claims created when a topology is terminated. |
This section demonstrates how you can configure a topology's
Executor
and/orManager
(hereinafter referred to asHeron containers
) resourceRequests
andLimits
through CLI commands.
You may configure an individual topology's Heron container
's resource Requests
and Limits
during submission through CLI commands. The default behaviour is to acquire values for resources from Configurations and for them to be common between the Executor
s and the Manager
for a topology.
The command pattern is as follows:
heron.kubernetes.[executor | manager].[limits | requests].[OPTION]=[VALUE]
The currently supported CLI options
and their associated values
are:
cpu
: A natural number indicating the number of CPU units.memory
: A natural number indicating the amount of memory units.
An example submission command is as follows.
Limits and Requests:
~/bin/heron submit kubernetes ~/.heron/examples/heron-api-examples.jar \
org.apache.heron.examples.api.AckingTopology acking \
--config-property heron.kubernetes.manager.limits.cpu=2 \
--config-property heron.kubernetes.manager.limits.memory=3 \
--config-property heron.kubernetes.manager.requests.cpu=1 \
--config-property heron.kubernetes.manager.requests.memory=2