This repository demonstrates some of the basic features and lifecycle of operators. More concretelly, a H2O operator is to be deployed on OCP4 and the Monitoring stack is used to get metrics and send alerts. Finally it will be explained how to delete the project.
The first step is to define the OperatorGroup
, which selects target namespaces in which to generate required RBAC access for its member Operators, and a Subscription
to subscribe a namespace to an Operator.
The list of Operators available to the cluster from the OperatorHub can be found by:
$ oc get packagemanifests -n openshift-marketplace
NAME CATALOG AGE
fuse-console Red Hat Operators 20h
percona-xtradb-cluster-operator-certified-rhmp Red Hat Marketplace 20h
neuvector-community-operator Community Operators 20h
federatorai-certified Certified Operators 20h
apicast-operator Red Hat Operators 20h
atlasmap-operator Community Operators 20h
aws-event-sources-operator-certified Certified Operators 20h
aws-efs-operator Community Operators 20h
k10-kasten-operator Certified Operators 20h
...
$ oc get packagemanifests -n openshift-marketplace | grep h2o
h2o-operator Certified Operators 20h
Before installing the operator we need to check what are the Install Modes
available and the CRDs
that will be generated:
$ oc describe packagemanifests h2o-operator -n openshift-marketplace
Name: h2o-operator
Namespace: openshift-marketplace
Labels: catalog=certified-operators
...
Status:
Catalog Source: certified-operators
Catalog Source Display Name: Certified Operators
Catalog Source Namespace: openshift-marketplace
Catalog Source Publisher: Red Hat
Channels:
Current CSV: h2o-operator.v0.1.0
Current CSV Desc:
Annotations:
Alm - Examples: [{"apiVersion":"h2o.ai/v1beta","kind":"H2O","metadata":{"name":"h2o-test"},"spec":{"nodes":1,"resources":{"cpu":1,"memory":"256Mi","memoryPercentage":90},"customImage":{"image":"registry.connect.redhat.com/h2oai/h2o@sha256:62500ca14adacd1164d7ef4f64ccc96c3d0ad90ddf29c3e02d1aa82bd42aa1a4"}}}]
Capabilities: Basic Install
Categories: AI/Machine Learning
Certified: true
Container Image: registry.connect.redhat.com/h2oai/h2o-operator@sha256:5dca5476c060b49cfc3e1e13b2c3c92920ae8789a092169a2c8457211895298d
Created At: 2021-02-09T11:00:58UTC
Description: A Kubernetes operator for H2O Open Source, Distributed, Fast & Scalable Machine Learning Platform.
Repository: https://github.com/h2oai/h2o-kubernetes
Support: H2O.ai
Apiservicedefinitions:
Customresourcedefinitions:
Owned:
Description: H2O
Display Name: H2O
Kind: H2O
Name: h2os.h2o.ai
Version: v1beta
Description: ...
Display Name: H2O Open Source Machine Learning Operator
Install Modes:
Supported: true
Type: OwnNamespace
Supported: true
Type: SingleNamespace
Supported: true
Type: MultiNamespace
Supported: false
Type: AllNamespaces
...
An Operator is considered a member of an OperatorGroup if the following conditions are true:
-
The Operator’s CSV exists in the same namespace as the OperatorGroup.
-
The Operator’s CSV’s InstallModes support the set of namespaces targeted by the OperatorGroup.
The h2o-01-operator.yaml
template creates the OperatorGroup
to its own namespace and subscribes to the h2o-operator
:
OPERATOR_NAMESPACE=h2o-operator # Define the namespace
$ oc process -f templates/h2o-01-operator.yaml -p OPERATOR_NAMESPACE=${OPERATOR_NAMESPACE} | oc apply -f -
At this point, OLM is now aware of the selected Operator. A cluster service version (CSV) for the Operator should appear in the target namespace, and APIs provided by the Operator should be available for creation. Let’s check it:
$ oc get crd -n ${OPERATOR_NAMESPACE}| grep "h2o"
h2os.h2o.ai 2021-04-19T11:32:49Z
$ oc describe crd h2os.h2o.ai
Name: h2os.h2o.ai
Namespace:
Labels: operators.coreos.com/h2o-operator.h2o-operator1=
Annotations: <none>
API Version: apiextensions.k8s.io/v1
Kind: CustomResourceDefinition
Metadata:
Creation Timestamp: 2021-04-19T11:32:49Z
Generation: 1
Managed Fields:
API Version: apiextensions.k8s.io/v1beta1
Fields Type: FieldsV1
...
Spec:
Conversion:
Strategy: None
Group: h2o.ai
Names:
Kind: H2O
List Kind: H2OList
Plural: h2os
Singular: h2o
Preserve Unknown Fields: true
Scope: Namespaced
Versions:
Name: v1beta
Schema:
openAPIV3Schema:
Properties:
Spec:
One Of:
Required:
version
Required:
customImage
Properties:
Custom Image:
Properties:
Command:
Type: string
Image:
Type: string
Required:
image
Type: object
Nodes:
Type: integer
Resources:
Properties:
Cpu:
Minimum: 1
Type: integer
Memory:
Pattern: ^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$
Type: string
Memory Percentage:
Maximum: 100
Minimum: 1
Type: integer
Required:
cpu
memory
Type: object
Version:
Type: string
Required:
nodes
resources
Type: object
Status:
Type: object
Type: object
Served: true
Storage: true
Subresources:
Status:
Status:
Accepted Names:
Kind: H2O
List Kind: H2OList
Plural: h2os
Singular: h2o
Conditions:
...
Stored Versions:
v1beta
Now, a CR of kind H2O
can be deployed into the namespace:
$ oc process -f templates/h2o-02-instance.yaml -p OPERATOR_NAMESPACE=${OPERATOR_NAMESPACE} | oc apply -f -
Once the instance is deployed, a route
needs to be created to expose the service(svc
)
$ oc expose svc/h2o-test -n ${OPERATOR_NAMESPACE}
By executing oc get routes
we can copy the route create it and access the GUI of H2o in our desired navigation explorer.
A typical OpenShift monitoring stack includes Prometheus for monitoring both systems and services, and Grafana for analyzing and visualizing metrics.
Administrators are often looking to write custom queries and create custom dashboards in Grafana. However, Grafana instances provided with the monitoring stack (and its dashboards) are read-only. To solve this problem, we can use the community-powered Grafana operator provided by OperatorHub. I will follow the implementation accurately explained here.
As with the H2O operator, we first need to subscribe and deploy the operator using the following template:
OPERATOR_NAMESPACE="grafana"
$ oc process -f templates/grafana-01-operator.yaml -p OPERATOR_NAMESPACE=${OPERATOR_NAMESPACE}| oc apply -f -
Now, a Grafana instance is created using the operator:
oc process -f templates/grafana-02-instance.yaml -p OPERATOR_NAMESPACE=${OPERATOR_NAMESPACE}| oc apply -f -
A GrafanaDataSource
, that points to the Prometheus metrics, is created:
oc adm policy add-cluster-role-to-user cluster-monitoring-view -z grafana-serviceaccount -n ${OPERATOR_NAMESPACE}
BEARER_TOKEN=$(oc serviceaccounts get-token grafana-serviceaccount -n ${OPERATOR_NAMESPACE})
oc process -f templates/grafana-03-datasource.yaml -p BEARER_TOKEN=${BEARER_TOKEN} | oc apply -f -
And finally the Grafana dashboard is to be created:
DASHBOARD_NAME="grafana-dashboard-h2o"
# Create a configMap containing the Dashboard
oc create configmap $DASHBOARD_NAME --from-file=dashboard=grafana/$DASHBOARD_NAME.json -n ${OPERATOR_NAMESPACE}
# Create a Dashboard object that automatically updates Grafana
oc process -f templates/grafana-04-dashboard.yaml -p DASHBOARD_NAME=$DASHBOARD_NAME | oc apply -f -
The Alertmanager manages incoming alerts; this includes silencing, inhibition, aggregation, and sending out notifications through methods such as email, PagerDuty, and HipChat.
An implementation example through email
is given in in [templates/alertmanager/alertmanager.yaml](templates/alertmanager/alertmanager.yaml).
ℹ️
|
You need to create an [App Password](https://support.google.com/accounts/answer/185833?hl=en). To do that, go to Account Settings → Security → Signing in to Google → App password (if you don’t see App password as an option, you probably haven’t set up 2-Step Verification and will need to do that first). Copy the newly-created password. |
The Alertmanager configuration can be updated replacing the content of the alertmanager-main Secret
.
$ oc create secret generic alertmanager-main \
--from-file=templates/alertmanager/alertmanager.yml \
--dry-run -o=yaml -n openshift-monitoring |\
oc replace secret --filename=- -n openshift-monitoring
Moreover, We can configure the Alertmanager through the Openshift 4 platform, in Administration → Cluster Settings → Global configuration → Alertmanager
If everything works as expected the receiver should receive notifications like the following one:
Once we want to remove a namespace, it is recommended to remove all the object in that namespace before removing the namespace.
$ oc get all -n h2o-operator
NAME READY STATUS RESTARTS AGE
pod/h2o-operator-6c9cbdc57-m4n7v 1/1 Running 0 2m30s
pod/h2o-test-0 1/1 Running 0 2m18s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/h2o-test ClusterIP None <none> 80/TCP 2m18s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/h2o-operator 1/1 1 1 2m30s
NAME DESIRED CURRENT READY AGE
replicaset.apps/h2o-operator-6c9cbdc57 1 1 1 2m30s
NAME READY AGE
statefulset.apps/h2o-test 1/1 2m18s
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
route.route.openshift.io/h2o-test h2o-test-h2o-operator.apps.apps.sandbox1684.opentlc.com h2o-test 54321 None
$
Making use of labels
one can remove all objects with a specific label.
Nevertheless in this section I am going to focus on how to force the deletion of the namespace if things go wrong, let’s do it:
$ oc project default # leave the project we want to delete
$ oc delete project h2o-operator
project.project.openshift.io "h2o-operator" deleted
$ oc get all -n h2o-operator
No resources found in h2o-operator namespace.
$ oc get project
NAME DISPLAY NAME STATUS
default Active
grafana Grafana - Operator Active
h2o-operator H2O - Operator Terminating
Mmmmmm… even though there are no longer objects in the namespace, it gets stuck in the Terminating
state.
If we dig deeper, we find that the H2O
instace cannot be deleted.
$ oc get H2O -n h2o-operator
NAME AGE
h2o-test 98m
Following this, we need to remove the value in finalizers
to force the deletion of the instance:
$ oc edit H2O h2o-test -n h2o-operator
Name: h2o-test
Namespace: h2o-operator
Labels: <none>
Annotations: <none>
API Version: h2o.ai/v1beta
...
Metadata:
Creation Timestamp: 2021-04-22T14:48:36Z
Deletion Grace Period Seconds: 0
Deletion Timestamp: 2021-04-22T16:24:24Z
Finalizers:
h2os.h2o.ai # remove this line
...
Now, checking back again for the H2O
instance and the projects:
$ oc get H2O -n h2o-operator
No resources found in h2o-operator namespace.
$ oc get project
NAME DISPLAY NAME STATUS
default Active
grafana Grafana - Operator Active
kube-node-lease Active
kube-public Active
We made it!!