Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

Latest commit

 

History

History
527 lines (407 loc) · 25.1 KB

schedulers-k8s-execution-environment.md

File metadata and controls

527 lines (407 loc) · 25.1 KB
id title sidebar_label
schedulers-k8s-execution-environment
Kubernetes Execution Environment Customization
Kubernetes Execution Environment Customization

Customizing the Heron Execution Environment

This document demonstrates how you can customize various aspects of the Heron execution environment when using the Kubernetes Scheduler.


Table of contents:




Customizing a Topology's Execution Environment Using Pod Templates


This section demonstrates how you can utilize custom Pod Templates embedded in Configuration Maps for your Topology's Executors and Manager (hereinafter referred to as Heron containers). You may specify different Pod Templates for different topologies.


When you deploy a topology to Heron on Kubernetes, you may specify individual Pod Templates to be used in your topology's Executors and Manager. This can be achieved by providing valid Pod Templates, and embedding the Pod Templates in Configuration Maps. By default, Heron will use a minimally configured Pod Template which is adequate to deploy a topology.

Pod Templates will allow you to configure most aspects of your topology's execution environment, with some exceptions. There are some aspects of Pods for which Heron will have the final say, and which will not be user-customizable. Please view the tables at the end of this section to identify what is set by Heron.


System Administrators:

  • You may wish to disable the ability to load custom Pod Templates. To achieve this, you must pass the define option -D heron.kubernetes.pod.template.disabled=true to the Heron API Server on the command line when launching. This command has been added to the Kubernetes configuration files to deploy the Heron API Server and is set to false by default.
  • If you have a custom Role for the Heron API Server you will need to ensure the ServiceAccount attached to the API server, via a RoleBinding, has the correct permissions to access the ConfigMaps:
rules:
- apiGroups: 
 - ""
 resources: 
 - configmaps
 verbs: 
 - get
 - list

Preparation

To deploy a custom Pod Template to Kubernetes with your topology, you must provide a valid Pod Template embedded in a valid Configuration Map. We will be using the following variables throughout this document, some of which are reserved variable names:

  • POD-TEMPLATE-NAME: This is the name of the Pod Template's YAML definition file. This is not a reserved variable and is a place-holder name.
  • CONFIG-MAP-NAME: This is the name that will be used by the Configuration Map in which the Pod Template will be embedded by kubectl. This is not a reserved variable and is a place-holder name.
  • heron.kubernetes.[executor | manager].pod.template: This variable name is used as the key passed to Heron for the --config-property on the CLI. This is a reserved variable name.

NOTE: Please do not use the . (period character) in the name of the CONFIG-MAP-NAME. This character will be used as a delimiter when submitting your topologies.

It is highly advised that you validate your Pod Templates before placing them in a ConfigMap to isolate any validity issues using a tool such as Kubeval or the built-in dry-run functionality in Kubernetes. Whilst these tools are handy, they will not catch all potential errors in Kubernetes configurations.

NOTE: When submitting a Pod Template to customize an Executor or Manager, Heron will look for containers named executor and manager respectively. These containers will be modified to support the functioning of Heron, please read further below.

Pod Templates

An example of the Pod Template format is provided below, and is derived from the configuration for the Heron Tracker Pod:

apiVersion: v1
kind: PodTemplate
metadata:
  name: heron-tracker
  namespace: default
template:
  metadata:
    labels:
      app: heron-tracker
  spec:
    containers:
      - name: heron-tracker
        image: apache/heron:latest
        ports:
          - containerPort: 8888
            name: api-port
        resources:
          requests:
            cpu: "100m"
            memory: "200M"
          limits:
            cpu: "400m"
            memory: "512M"

You would need to save this file as POD-TEMPLATE-NAME. Once you have a valid Pod Template you may proceed to generate a ConfigMap.

Configuration Maps

You must place the ConfigMap in the same namespace as the Heron API Server using the --namespace option in the commands below if the API Server is not in the default namespace.

To generate a ConfigMap you will need to run the following command:

kubectl create configmap CONFIG-MAP-NAME --from-file path/to/POD-TEMPLATE-NAME

You may then want to verify the contents of the ConfigMap by running the following command:

kubectl get configmaps CONFIG-MAP-NAME -o yaml

The ConfigMap should appear similar to the one below for our example:

apiVersion: v1
data:
  POD-TEMPLATE-NAME: |
    apiVersion: v1
    kind: PodTemplate
    metadata:
      name: heron-tracker
      namespace: default
    template:
      metadata:
        labels:
          app: heron-tracker
      spec:
        containers:
          - name: heron-tracker
            image: apache/heron:latest
            ports:
              - containerPort: 8888
                name: api-port
            resources:
              requests:
                cpu: "100m"
                memory: "200M"
              limits:
                cpu: "400m"
                memory: "512M"
kind: ConfigMap
metadata:
  creationTimestamp: "2021-09-27T21:55:30Z"
  name: CONFIG-MAP-NAME
  namespace: default
  resourceVersion: "1313"
  uid: ba001653-03d9-4ac8-804c-d2c55c974281

Submitting

To use the ConfigMap for a topology you would will need to submit with the additional flag --confg-property. The --config-property key=value takes a key-value pair:

  • Key: heron.kubernetes.[executor | manager].pod.template
  • Value: CONFIG-MAP-NAME.POD-TEMPLATE-NAME

Please note that you must concatenate CONFIG-MAP-NAME and POD-TEMPLATE-NAME with a . (period character).

For example:

heron submit kubernetes \
  --service-url=http://localhost:8001/api/v1/namespaces/default/services/heron-apiserver:9000/proxy \
  ~/.heron/examples/heron-api-examples.jar \
  org.apache.heron.examples.api.AckingTopology acking \
  --config-property heron.kubernetes.executor.pod.template=CONFIG-MAP-NAME.POD-TEMPLATE-NAME \
  --config-property heron.kubernetes.manager.pod.template=CONFIG-MAP-NAME.POD-TEMPLATE-NAME

Heron Configured Items in Pod Templates

Heron will locate the containers named executor and/or manager in the Pod Template and customize them as outlined below. All other containers within the Pod Templates will remain unchanged.

Executor and Manager Containers

All metadata for the Heron containers will be overwritten by Heron. In some other cases, values from the Pod Template for the executor and manager will be overwritten by Heron as outlined below.

Name Description Policy
image The Heron container's image. Overwritten by Heron using values from the config.
env Environment variables are made available within the container. The HOST and POD_NAME keys are required by Heron and are thus reserved. Merged with Heron's values taking precedence. Deduplication is based on name.
ports Port numbers opened within the container. Some of these port numbers are required by Heron and are thus reserved. The reserved ports are defined in Heron's constants as [6001-6010]. Merged with Heron's values taking precedence. Deduplication is based on the containerPort value.
limits
requests
Heron will attempt to load values for cpu and memory from configs. Heron's values take precedence over those in the Pod Templates.
volumeMounts These are the mount points within the Heron container for the volumes available in the Pod. Merged with Heron's values taking precedence. Deduplication is based on the name value.
Annotation: prometheus.io/scrape Flag to indicate whether Prometheus logs can be scraped and is set to true. Value is overridden by Heron.
Annotation prometheus.io/port Port address for Prometheus log scraping and is set to 8080. Values are overridden by Heron.
Annotation: Pod Pod's revision/version hash. Automatically set.
Annotation: Service Labels services can use to attach to the Pod. Automatically set.
Label: app Name of the application launching the Pod and is set to Heron. Values are overridden by Heron.
Label: topology The name of topology which was provided when submitting. User-defined and supplied on the CLI.

Pod

The following items will be set in the Pod Template's spec by Heron.

Name Description Policy
terminationGracePeriodSeconds Grace period to wait before shutting down the Pod after a SIGTERM signal and is set to 0 seconds. Values are overridden by Heron.
tolerations Attempts to schedule Pods with taints onto nodes hosting Pods with matching taints. The entries below are included by default.
Keys:
node.kubernetes.io/not-ready
node.kubernetes.io/unreachable
Values (common):
operator: Exists
effect: NoExecute
tolerationSeconds: 10L
Merged with Heron's values taking precedence. Deduplication is based on the key value.
containers Configurations for containers to be launched within the Pod. All containers, excluding the Heron containers, are loaded as-is.
volumes Volumes to be made available to the entire Pod. Merged with Heron's values taking precedence. Deduplication is based on the name value.
secretVolumes Secrets to be mounted as volumes within the Pod. Loaded from the Heron configs if present.



Adding Persistent Volumes via the Command Line Interface


This section demonstrates how you can utilize both static and dynamically backed Persistent Volume Claims in the Executor and Manager containers (hereinafter referred to as Heron containers). You will need to enable Dynamic Provisioning in your Kubernetes cluster to proceed to use the dynamic provisioning functionality.


It is possible to leverage Persistent Volumes with custom Pod Templates but the Volumes you add will be shared between all Executor Pods in the topology when customizing the Executors.

The CLI commands allow you to configure a Persistent Volume Claim (dynamically or statically backed) which will be unique and isolated to each Pod and mounted in a single Heron container when you submit your topology with a claim name of OnDemand. Using any claim name other than on OnDemand will permit you to configure a shared Persistent Volume without a custom Pod Template which will be shared between all Executor Pods when customizing them. The CLI commands override any configurations you may have present in the Pod Template, but Heron's configurations will take precedence over all others.

Some use cases include process checkpointing, caching of results for later use in the process, intermediate results which could prove useful in analysis (ETL/ELT to a data lake or warehouse), as a source of data enrichment, etc.

Note: Heron will remove any dynamically backed Persistent Volume Claims it creates when a topology is terminated. Please be aware that Heron uses the following Labels to locate the claims it has created:

metadata:
  labels:
    topology: <topology-name>
    onDemand: true

System Administrators:

  • You may wish to disable the ability to configure Persistent Volume Claims specified via the CLI. To achieve this, you must pass the define option -D heron.kubernetes.persistent.volume.claims.cli.disabled=trueto the Heron API Server on the command line when launching. This command has been added to the Kubernetes configuration files to deploy the Heron API Server and is set to false by default.
  • If you have a custom Role/ClusterRole for the Heron API Server you will need to ensure the ServiceAccount attached to the API server has the correct permissions to access the Persistent Volume Claims:
rules:
- apiGroups: 
 - ""
 resources: 
 - persistentvolumeclaims
 verbs: 
 - create
 - delete
 - get
 - list
 - deletecollection

Usage

To configure a Persistent Volume Claim you must use the --config-property option with the heron.kubernetes.[executor | manager].volumes.persistentVolumeClaim. command prefix. Heron will not validate your Persistent Volume Claim configurations, so please validate them to ensure they are well-formed. All names must comply with the lowercase RFC-1123 standard.

The command pattern is as follows: heron.kubernetes.[executor | manager].volumes.persistentVolumeClaim.[VOLUME NAME].[OPTION]=[VALUE]

The currently supported CLI options are:

  • claimName
  • storageClass
  • sizeLimit
  • accessModes
  • volumeMode
  • path
  • subPath

Note: A claimName of OnDemand will create unique Volumes for each Heron container as well as deploy a Persistent Volume Claim for each Volume. Any other claim name will result in a shared Volume being created between all Pods in the topology.

Note: The accessModes must be a comma-separated list of values without any white space. Valid values can be found in the Kubernetes documentation.

Note: If a storageClassName is specified and there are no matching Persistent Volumes then dynamic provisioning must be enabled. Kubernetes will attempt to locate a Persistent Volume that matches the storageClassName before it attempts to use dynamic provisioning. If a storageClassName is not specified there must be Persistent Volumes provisioned manually with the storageClassName of standard.


Example

A series of example commands to add Persistent Volumes to Executors, and the YAML entries they make in their respective configurations, are as follows.

Dynamic:

--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.claimName=OnDemand
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.storageClassName=storage-class-name-of-choice
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.accessModes=comma,separated,list
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.sizeLimit=555Gi
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.volumeMode=volume-mode-of-choice
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.path=/path/to/mount
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.subPath=/sub/path/to/mount

Generated Persistent Volume Claim:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  labels:
    app: heron
    onDemand: "true"
    topology: <topology-name>
  name: volumenameofchoice-<topology-name>-[Ordinal]
spec:
  accessModes:
  - comma
  - separated
  - list
  resources:
    requests:
      storage: 555Gi
  storageClassName: storage-class-name-of-choice
  volumeMode: volume-mode-of-choice

Pod Spec entries for Volume:

volumes:
  - name: volumenameofchoice
    persistentVolumeClaim:
      claimName: volumenameofchoice-<topology-name>-[Ordinal]

Executor container entries for Volume Mounts:

volumeMounts:
  - mountPath: /path/to/mount
    subPath: /sub/path/to/mount
    name: volumenameofchoice

Static:

--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.claimName=OnDemand
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.accessModes=comma,separated,list
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.sizeLimit=555Gi
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.volumeMode=volume-mode-of-choice
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.path=/path/to/mount
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.subPath=/sub/path/to/mount

Generated Persistent Volume Claim:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  labels:
    app: heron
    onDemand: "true"
    topology: <topology-name>
  name: volumenameofchoice-<topology-name>-[Ordinal]
spec:
  accessModes:
  - comma
  - separated
  - list
  resources:
    requests:
      storage: 555Gi
  storageClassName: standard
  volumeMode: volume-mode-of-choice

Pod Spec entries for Volume:

volumes:
  - name: volumenameofchoice
    persistentVolumeClaim:
      claimName: volumenameofchoice-<topology-name>-[Ordinal]

Executor container entries for Volume Mounts:

volumeMounts:
  - mountPath: /path/to/mount
    subPath: /sub/path/to/mount
    name: volumenameofchoice

Submitting

A series of example commands to sumbit a topology using the dynamic example CLI commands above:

heron submit kubernetes \
  --service-url=http://localhost:8001/api/v1/namespaces/default/services/heron-apiserver:9000/proxy \
  ~/.heron/examples/heron-api-examples.jar \
  org.apache.heron.examples.api.AckingTopology acking \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.claimName=OnDemand \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.storageClassName=storage-class-name-of-choice \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.accessModes=comma,separated,list \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.sizeLimit=555Gi \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.volumeMode=volume-mode-of-choice \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.path=/path/to/mount \
--config-property heron.kubernetes.executor.volumes.persistentVolumeClaim.volumenameofchoice.subPath=/sub/path/to/mount

Required and Optional Configuration Items

The following table outlines CLI options which are either required ( ✅ ), optional ( ❔ ), or not available ( ❌ ) depending on if you are using dynamically/statically backed or shared Volumes.

Option Dynamic Static Shared
VOLUME NAME
claimName OnDemand OnDemand A valid name
path
subPath
storageClassName
accessModes
sizeLimit
volumeMode

Note: The VOLUME NAME will be extracted from the CLI command and a claimName is a always required.


Configuration Items Created and Entries Made

The configuration items and entries in the tables below will made in their respective areas.

A Volume and a Volume Mount will be created for each volume name which you specify. Additionally, one Persistent Volume Claim will be created for each Volume specified as dynamic using the OnDemand claim name.

Name Description Policy
VOLUME NAME The name of the Volume. Entries made in the Persistent Volume Claim's spec, the Pod Spec's Volumes, and the Heron containers volumeMounts.
claimName A Claim name for the Persistent Volume. If OnDemand is provided as the parameter then a unique Volume and Persistent Volume Claim will be created. Any other name will result in a shared Volume between all Pods in the topology with only a Volume and Volume Mount being added.
path The mountPath of the Volume. Entries made in the Heron containers volumeMounts.
subPath The subPath of the Volume. Entries made in the Heron containers volumeMounts.
storageClassName The identifier name used to reference the dynamic StorageClass. Entries made in the Persistent Volume Claim and Pod Spec's Volume.
accessModes A comma-separated list of access modes. Entries made in the Persistent Volume Claim.
sizeLimit A resource request for storage space units. Entries made in the Persistent Volume Claim.
volumeMode Either FileSystem (default) or Block (raw block). Read more. Entries made in the Persistent Volume Claim.
Labels Two labels for topology and onDemand provisioning are added. These labels are only added to dynamically backed Persistent Volume Claims created by Heron to support the removal of any claims created when a topology is terminated.



Setting Limits and Requests via the Command Line Interface

This section demonstrates how you can configure a topology's Executor and/or Manager (hereinafter referred to as Heron containers) resource Requests and Limits through CLI commands.


You may configure an individual topology's Heron container's resource Requests and Limits during submission through CLI commands. The default behaviour is to acquire values for resources from Configurations and for them to be common between the Executors and the Manager for a topology.


Usage

The command pattern is as follows: heron.kubernetes.[executor | manager].[limits | requests].[OPTION]=[VALUE]

The currently supported CLI options and their associated values are:

  • cpu: A natural number indicating the number of CPU units.
  • memory: A natural number indicating the amount of memory units.

Example

An example submission command is as follows.

Limits and Requests:

~/bin/heron submit kubernetes ~/.heron/examples/heron-api-examples.jar \
org.apache.heron.examples.api.AckingTopology acking \
--config-property heron.kubernetes.manager.limits.cpu=2 \
--config-property heron.kubernetes.manager.limits.memory=3 \
--config-property heron.kubernetes.manager.requests.cpu=1 \
--config-property heron.kubernetes.manager.requests.memory=2