Skip to content

Install the H2o operator and instance into an OCP4 cluster

Notifications You must be signed in to change notification settings

hect1995/h2o-ocp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

H2O Operator Openshift 4

This repository demonstrates some of the basic features and lifecycle of operators. More concretelly, a H2O operator is to be deployed on OCP4 and the Monitoring stack is used to get metrics and send alerts. Finally it will be explained how to delete the project.

1. Launch H2O

1.1. Introduction

The first step is to define the OperatorGroup, which selects target namespaces in which to generate required RBAC access for its member Operators, and a Subscription to subscribe a namespace to an Operator.
The list of Operators available to the cluster from the OperatorHub can be found by:

$ oc get packagemanifests -n openshift-marketplace
NAME                                                 CATALOG               AGE
fuse-console                                         Red Hat Operators     20h
percona-xtradb-cluster-operator-certified-rhmp       Red Hat Marketplace   20h
neuvector-community-operator                         Community Operators   20h
federatorai-certified                                Certified Operators   20h
apicast-operator                                     Red Hat Operators     20h
atlasmap-operator                                    Community Operators   20h
aws-event-sources-operator-certified                 Certified Operators   20h
aws-efs-operator                                     Community Operators   20h
k10-kasten-operator                                  Certified Operators   20h
...

$ oc get packagemanifests -n openshift-marketplace | grep h2o
h2o-operator                                         Certified Operators   20h

Before installing the operator we need to check what are the Install Modes available and the CRDs that will be generated:

$ oc describe packagemanifests h2o-operator -n openshift-marketplace
Name:         h2o-operator
Namespace:    openshift-marketplace
Labels:       catalog=certified-operators
...
Status:
  Catalog Source:               certified-operators
  Catalog Source Display Name:  Certified Operators
  Catalog Source Namespace:     openshift-marketplace
  Catalog Source Publisher:     Red Hat
  Channels:
    Current CSV:  h2o-operator.v0.1.0
    Current CSV Desc:
      Annotations:
        Alm - Examples:   [{"apiVersion":"h2o.ai/v1beta","kind":"H2O","metadata":{"name":"h2o-test"},"spec":{"nodes":1,"resources":{"cpu":1,"memory":"256Mi","memoryPercentage":90},"customImage":{"image":"registry.connect.redhat.com/h2oai/h2o@sha256:62500ca14adacd1164d7ef4f64ccc96c3d0ad90ddf29c3e02d1aa82bd42aa1a4"}}}]
        Capabilities:     Basic Install
        Categories:       AI/Machine Learning
        Certified:        true
        Container Image:  registry.connect.redhat.com/h2oai/h2o-operator@sha256:5dca5476c060b49cfc3e1e13b2c3c92920ae8789a092169a2c8457211895298d
        Created At:       2021-02-09T11:00:58UTC
        Description:      A Kubernetes operator for H2O  Open Source, Distributed, Fast & Scalable Machine Learning Platform.
        Repository:       https://github.com/h2oai/h2o-kubernetes
        Support:          H2O.ai
      Apiservicedefinitions:
      Customresourcedefinitions:
        Owned:
          Description:   H2O
          Display Name:  H2O
          Kind:          H2O
          Name:          h2os.h2o.ai
          Version:       v1beta
      Description: ...
            Display Name:  H2O Open Source Machine Learning Operator
      Install Modes:
        Supported:  true
        Type:       OwnNamespace
        Supported:  true
        Type:       SingleNamespace
        Supported:  true
        Type:       MultiNamespace
        Supported:  false
        Type:       AllNamespaces
      ...

An Operator is considered a member of an OperatorGroup if the following conditions are true:

  • The Operator’s CSV exists in the same namespace as the OperatorGroup.

  • The Operator’s CSV’s InstallModes support the set of namespaces targeted by the OperatorGroup.

1.2. Deploy the operator

The h2o-01-operator.yaml template creates the OperatorGroup to its own namespace and subscribes to the h2o-operator:

OPERATOR_NAMESPACE=h2o-operator # Define the namespace
$ oc process -f templates/h2o-01-operator.yaml -p OPERATOR_NAMESPACE=${OPERATOR_NAMESPACE} | oc apply -f -

At this point, OLM is now aware of the selected Operator. A cluster service version (CSV) for the Operator should appear in the target namespace, and APIs provided by the Operator should be available for creation. Let’s check it:

$ oc get crd -n ${OPERATOR_NAMESPACE}| grep "h2o"
h2os.h2o.ai             2021-04-19T11:32:49Z

$ oc describe crd h2os.h2o.ai
Name:         h2os.h2o.ai
Namespace:
Labels:       operators.coreos.com/h2o-operator.h2o-operator1=
Annotations:  <none>
API Version:  apiextensions.k8s.io/v1
Kind:         CustomResourceDefinition
Metadata:
  Creation Timestamp:  2021-04-19T11:32:49Z
  Generation:          1
  Managed Fields:
    API Version:  apiextensions.k8s.io/v1beta1
    Fields Type:  FieldsV1
    ...
Spec:
  Conversion:
    Strategy:  None
  Group:       h2o.ai
  Names:
    Kind:                   H2O
    List Kind:              H2OList
    Plural:                 h2os
    Singular:               h2o
  Preserve Unknown Fields:  true
  Scope:                    Namespaced
  Versions:
    Name:  v1beta
    Schema:
      openAPIV3Schema:
        Properties:
          Spec:
            One Of:
              Required:
                version
              Required:
                customImage
            Properties:
              Custom Image:
                Properties:
                  Command:
                    Type:  string
                  Image:
                    Type:  string
                Required:
                  image
                Type:  object
              Nodes:
                Type:  integer
              Resources:
                Properties:
                  Cpu:
                    Minimum:  1
                    Type:     integer
                  Memory:
                    Pattern:  ^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$
                    Type:     string
                  Memory Percentage:
                    Maximum:  100
                    Minimum:  1
                    Type:     integer
                Required:
                  cpu
                  memory
                Type:  object
              Version:
                Type:  string
            Required:
              nodes
              resources
            Type:  object
          Status:
            Type:  object
        Type:      object
    Served:        true
    Storage:       true
    Subresources:
      Status:
Status:
  Accepted Names:
    Kind:       H2O
    List Kind:  H2OList
    Plural:     h2os
    Singular:   h2o
  Conditions:
    ...
  Stored Versions:
    v1beta

1.3. Deploy and expose an H2O instance

Now, a CR of kind H2O can be deployed into the namespace:

$ oc process -f templates/h2o-02-instance.yaml -p OPERATOR_NAMESPACE=${OPERATOR_NAMESPACE} | oc apply -f -

Once the instance is deployed, a route needs to be created to expose the service(svc)

$ oc expose svc/h2o-test -n ${OPERATOR_NAMESPACE}

By executing oc get routes we can copy the route create it and access the GUI of H2o in our desired navigation explorer.

2. Monitoring

A typical OpenShift monitoring stack includes Prometheus for monitoring both systems and services, and Grafana for analyzing and visualizing metrics.

Administrators are often looking to write custom queries and create custom dashboards in Grafana. However, Grafana instances provided with the monitoring stack (and its dashboards) are read-only. To solve this problem, we can use the community-powered Grafana operator provided by OperatorHub. I will follow the implementation accurately explained here.

As with the H2O operator, we first need to subscribe and deploy the operator using the following template:

OPERATOR_NAMESPACE="grafana"
$ oc process -f templates/grafana-01-operator.yaml -p OPERATOR_NAMESPACE=${OPERATOR_NAMESPACE}| oc apply -f -

Now, a Grafana instance is created using the operator:

oc process -f templates/grafana-02-instance.yaml -p OPERATOR_NAMESPACE=${OPERATOR_NAMESPACE}| oc apply -f -

A GrafanaDataSource, that points to the Prometheus metrics, is created:

oc adm policy add-cluster-role-to-user cluster-monitoring-view -z grafana-serviceaccount -n ${OPERATOR_NAMESPACE}
BEARER_TOKEN=$(oc serviceaccounts get-token grafana-serviceaccount -n ${OPERATOR_NAMESPACE})
oc process -f templates/grafana-03-datasource.yaml -p BEARER_TOKEN=${BEARER_TOKEN} | oc apply -f -

And finally the Grafana dashboard is to be created:

DASHBOARD_NAME="grafana-dashboard-h2o"
# Create a configMap containing the Dashboard
oc create configmap $DASHBOARD_NAME --from-file=dashboard=grafana/$DASHBOARD_NAME.json -n ${OPERATOR_NAMESPACE}
# Create a Dashboard object that automatically updates Grafana
oc process -f templates/grafana-04-dashboard.yaml -p DASHBOARD_NAME=$DASHBOARD_NAME | oc apply -f -

3. Alert Manager

The Alertmanager manages incoming alerts; this includes silencing, inhibition, aggregation, and sending out notifications through methods such as email, PagerDuty, and HipChat.

An implementation example through email is given in in [templates/alertmanager/alertmanager.yaml](templates/alertmanager/alertmanager.yaml).

ℹ️
You need to create an [App Password](https://support.google.com/accounts/answer/185833?hl=en). To do that, go to Account Settings → Security → Signing in to Google → App password (if you don’t see App password as an option, you probably haven’t set up 2-Step Verification and will need to do that first). Copy the newly-created password.

The Alertmanager configuration can be updated replacing the content of the alertmanager-main Secret.

$ oc create secret generic alertmanager-main \
    --from-file=templates/alertmanager/alertmanager.yml \
        --dry-run -o=yaml -n openshift-monitoring |\
            oc replace secret --filename=- -n openshift-monitoring

Moreover, We can configure the Alertmanager through the Openshift 4 platform, in Administration → Cluster Settings → Global configuration → Alertmanager

ocp alertmanager gui

If everything works as expected the receiver should receive notifications like the following one:

alert manager notification

4. Delete a namespace

Once we want to remove a namespace, it is recommended to remove all the object in that namespace before removing the namespace.

$ oc get all -n h2o-operator
NAME                               READY   STATUS    RESTARTS   AGE
pod/h2o-operator-6c9cbdc57-m4n7v   1/1     Running   0          2m30s
pod/h2o-test-0                     1/1     Running   0          2m18s

NAME               TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/h2o-test   ClusterIP   None         <none>        80/TCP    2m18s

NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/h2o-operator   1/1     1            1           2m30s

NAME                                     DESIRED   CURRENT   READY   AGE
replicaset.apps/h2o-operator-6c9cbdc57   1         1         1       2m30s

NAME                        READY   AGE
statefulset.apps/h2o-test   1/1     2m18s

NAME                                HOST/PORT                                                 PATH   SERVICES   PORT    TERMINATION   WILDCARD
route.route.openshift.io/h2o-test   h2o-test-h2o-operator.apps.apps.sandbox1684.opentlc.com          h2o-test   54321                 None

$

Making use of labels one can remove all objects with a specific label. Nevertheless in this section I am going to focus on how to force the deletion of the namespace if things go wrong, let’s do it:

$ oc project default # leave the project we want to delete
$ oc delete project h2o-operator
project.project.openshift.io "h2o-operator" deleted

$ oc get all -n h2o-operator
No resources found in h2o-operator namespace.

$ oc get project
NAME                                               DISPLAY NAME         STATUS
default                                                                 Active
grafana                                            Grafana - Operator   Active
h2o-operator                                       H2O - Operator       Terminating

Mmmmmm…​ even though there are no longer objects in the namespace, it gets stuck in the Terminating state. If we dig deeper, we find that the H2O instace cannot be deleted.

$ oc get H2O -n h2o-operator
NAME       AGE
h2o-test   98m

Following this, we need to remove the value in finalizers to force the deletion of the instance:

$ oc edit H2O h2o-test -n h2o-operator
Name:         h2o-test
Namespace:    h2o-operator
Labels:       <none>
Annotations:  <none>
API Version:  h2o.ai/v1beta
...
Metadata:
  Creation Timestamp:             2021-04-22T14:48:36Z
  Deletion Grace Period Seconds:  0
  Deletion Timestamp:             2021-04-22T16:24:24Z
  Finalizers:
    h2os.h2o.ai # remove this line
...

Now, checking back again for the H2O instance and the projects:

$ oc get H2O -n h2o-operator
No resources found in h2o-operator namespace.

$ oc get project
NAME                                               DISPLAY NAME         STATUS
default                                                                 Active
grafana                                            Grafana - Operator   Active
kube-node-lease                                                         Active
kube-public                                                             Active

We made it!!

Releases

No releases published

Packages

No packages published