Skip to content

oneke_ops

Michal Opala edited this page Apr 23, 2024 · 8 revisions

Operating OneKE

Accessing K8s Cluster

The leader VNF node runs an HAProxy instance that by default exposes Kubernetes API port 6443 on the public VIP address over the HTTPS protocol (secured with two-way SSL/TLS certificates).

This HAProxy instance can be used in two ways:

  • As a stable Control Plane endpoint for the whole Kubernetes cluster.
  • As an external Kubernetes API endpoint that can be reached from outside the internal VNET.
graph LR;
    internet --- vnf;
    vnf --- master & worker & storage;
    internet((Internet));
    style vnf text-align:left
    style master text-align:left
    style worker text-align:left
    style storage text-align:left
    vnf[["vnf (NAT 🔀)"<br>haproxy - *:6443<br><hr>eth0:10.2.11.86<br><hr>eth1:172.20.0.68]];
    master[master<br>kube-apiserver - *:6443<br><hr>eth0:172.20.0.101<br><hr>GW:172.20.0.86<br>DNS:1.1.1.1];
    worker[worker<br><hr>eth0:172.20.0.102<br><hr>GW:172.20.0.86<br>DNS:1.1.1.1];
    storage[storage<br><hr>eth0:172.20.0.103<br><hr>GW:172.20.0.86<br>DNS:1.1.1.1];
Loading

To access the Kubernetes API you'll need a kubeconfig file which, in the case of RKE2, can be copied from the /etc/rancher/rke2/rke2.yaml file located on every master node, for example:

$ install -d ~/.kube/
$ scp -J root@10.2.11.86 root@172.20.0.101:/etc/rancher/rke2/rke2.yaml ~/.kube/config
Warning: Permanently added '10.2.11.86' (ED25519) to the list of known hosts.
Warning: Permanently added '172.20.0.101' (ED25519) to the list of known hosts.
rke2.yaml

Additionally you must adjust the Control Plane endpoint inside the file to point to the public VIP:

$ gawk -i inplace -f- ~/.kube/config <<'EOF'
/^    server: / { $0 = "    server: https://10.2.11.86:6443" }
{ print }
EOF

And then your local kubectl command should work just fine:

$ kubectl get nodes
NAME                    STATUS   ROLES                       AGE    VERSION
oneke-ip-172-20-0-101   Ready    control-plane,etcd,master   33m    v1.27.2+rke2r1
oneke-ip-172-20-0-102   Ready    <none>                      28m    v1.27.2+rke2r1
oneke-ip-172-20-0-103   Ready    <none>                      28m    v1.27.2+rke2r1
oneke-ip-172-20-0-104   Ready    control-plane,etcd,master   12m    v1.27.2+rke2r1
oneke-ip-172-20-0-105   Ready    control-plane,etcd,master   10m    v1.27.2+rke2r1

Important

If you'd like to use a custom domain name for the Control Plane endpoint instead of the direct public VIP address, you need to add the domain to the ONEAPP_K8S_EXTRA_SANS context parameter, for example localhost,127.0.0.1,k8s.yourdomain.it, and set the domain inside the ~/.kube/config file as well. You can set up your domain in a public/private DNS server or in your local /etc/hosts file, whatever works for you.

Accessing K8s API via SSH tunnels

By default Kubernetes API Server's extra SANs are set to localhost,127.0.0.1 which allows you to access Kubernetes API via SSH tunnels.

Note

We recommend using the ProxyCommand SSH feature.

Download the /etc/rancher/rke2/rke2.yaml kubeconfig file:

$ install -d ~/.kube/
$ scp -o ProxyCommand='ssh -A root@10.2.11.86 -W %h:%p' root@172.20.0.101:/etc/rancher/rke2/rke2.yaml ~/.kube/config

Note

The 10.2.11.86 is the public VIP address, 172.20.0.101 is a private address of a master node inside the private VNET.

Create SSH tunnel, forward the 6443 TCP port:

$ ssh -o ProxyCommand='ssh -A root@10.2.11.86 -W %h:%p' -L 6443:localhost:6443 root@172.20.0.101

and then run kubectl in another terminal:

$ kubectl get nodes
NAME                    STATUS   ROLES                       AGE    VERSION
oneke-ip-172-20-0-101   Ready    control-plane,etcd,master   58m    v1.27.2+rke2r1
oneke-ip-172-20-0-102   Ready    <none>                      52m    v1.27.2+rke2r1
oneke-ip-172-20-0-103   Ready    <none>                      52m    v1.27.2+rke2r1
oneke-ip-172-20-0-104   Ready    control-plane,etcd,master   31m    v1.27.2+rke2r1
oneke-ip-172-20-0-105   Ready    control-plane,etcd,master   29m    v1.27.2+rke2r1

Usage Examples

Create a Longhorn PVC

To create a 4 GiB persistent volume apply the following manifest using kubectl:

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nginx
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 4Gi
  storageClassName: longhorn-retain
$ kubectl apply -f nginx-pvc.yaml
persistentvolumeclaim/nginx created
$ kubectl get pvc,pv
NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
persistentvolumeclaim/nginx   Bound    pvc-5b0f9618-b840-4544-bccc-6479c83b49d3   4Gi        RWO            longhorn-retain   78s

NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM           STORAGECLASS      REASON   AGE
persistentvolume/pvc-5b0f9618-b840-4544-bccc-6479c83b49d3   4Gi        RWO            Retain           Bound    default/nginx   longhorn-retain            76s

Important

The Retain reclaim policy may protect your persistent data from accidental removal. Always back up your data!

Create an NGINX Deployment

To deploy an NGINX instance using the PVC created previously, apply the following manifest using kubectl:

---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: http
        image: nginx:alpine
        imagePullPolicy: IfNotPresent
        ports:
        - name: http
          containerPort: 80
        volumeMounts:
        - mountPath: "/persistent/"
          name: nginx
      volumes:
      - name: nginx
        persistentVolumeClaim:
          claimName: nginx
$ kubectl apply -f nginx-deployment.yaml
deployment.apps/nginx created
$ kubectl get deployments,pods
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nginx   1/1     1            1           32s

NAME                         READY   STATUS    RESTARTS   AGE
pod/nginx-6b5d47679b-sjd9p   1/1     Running   0          32s

Create a Traefik IngressRoute

To expose the running NGINX instance over HTTP, on the port 80, on the public VNF VIP address, apply the following manifest using kubectl:

---
apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  selector:
    app: nginx
  type: ClusterIP
  ports:
    - name: http
      protocol: TCP
      port: 80
      targetPort: 80
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: nginx
spec:
  entryPoints: [web]
  routes:
    - kind: Rule
      match: Path(`/`)
      services:
        - kind: Service
          name: nginx
          port: 80
          scheme: http
$ kubectl apply -f nginx-svc-ingressroute.yaml
service/nginx created
ingressroute.traefik.containo.us/nginx created
$ kubectl get svc,ingressroute
NAME                 TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   10.43.0.1     <none>        443/TCP   3h18m
service/nginx        ClusterIP   10.43.99.36   <none>        80/TCP    63s

NAME                                     AGE
ingressroute.traefik.containo.us/nginx   63s

Verify that the new IngressRoute CRD (Custom Resource Definition) object is operational:

$ curl -fsSL http://10.2.11.86/ | grep title
<title>Welcome to nginx!</title>

Create a MetalLB LoadBalancer service

To expose the running NGINX instance over HTTP, on the port 80, using a private LoadBalancer service provided by MetalLB, apply the following manifest using kubectl:

---
apiVersion: v1
kind: Service
metadata:
  name: nginx-lb
spec:
  selector:
    app: nginx
  type: LoadBalancer
  ports:
    - name: http
      protocol: TCP
      port: 80
      targetPort: 80
$ kubectl apply -f nginx-loadbalancer.yaml
service/nginx-lb created
$ kubectl get svc
NAME         TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
kubernetes   ClusterIP      10.43.0.1       <none>        443/TCP        3h25m
nginx        ClusterIP      10.43.99.36     <none>        80/TCP         8m50s
nginx-lb     LoadBalancer   10.43.222.235   172.20.0.87   80:30050/TCP   73s

Verify that the new LoadBalancer service is operational:

$ curl -fsSL http://172.20.0.87/ | grep title
<title>Welcome to nginx!</title>

Upgrade

K8s clusters can be upgraded with the System Upgrade Controller provided by RKE2. Here's a handy bash snippet to illustrate the procedure:

#!/usr/bin/env bash

: "${SUC_VERSION:=0.9.1}"
: "${RKE2_VERSION:=v1.24.2-rc2+rke2r1}"

set -o errexit -o nounset

# Deploy the System Upgrade Controller.
kubectl apply -f "https://github.com/rancher/system-upgrade-controller/releases/download/v${SUC_VERSION}/system-upgrade-controller.yaml"

# Wait for required Custom Resource Definitions to appear.
for RETRY in 9 8 7 6 5 4 3 2 1 0; do
  if kubectl get crd/plans.upgrade.cattle.io --no-headers; then break; fi
  sleep 5
done && [[ "$RETRY" -gt 0 ]]

# Plan the upgrade.
kubectl apply -f- <<EOF
---
# Server plan
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: server-plan
  namespace: system-upgrade
  labels:
    rke2-upgrade: server
spec:
  concurrency: 1
  nodeSelector:
    matchExpressions:
       - {key: rke2-upgrade, operator: Exists}
       - {key: rke2-upgrade, operator: NotIn, values: ["disabled", "false"]}
       # When using k8s version 1.19 or older, swap control-plane with master
       - {key: node-role.kubernetes.io/control-plane, operator: In, values: ["true"]}
  serviceAccountName: system-upgrade
  tolerations:
  - key: CriticalAddonsOnly
    operator: Exists
  cordon: true
#  drain:
#    force: true
  upgrade:
    image: rancher/rke2-upgrade
  version: "$RKE2_VERSION"
---
# Agent plan
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: agent-plan
  namespace: system-upgrade
  labels:
    rke2-upgrade: agent
spec:
  concurrency: 1
  nodeSelector:
    matchExpressions:
      - {key: rke2-upgrade, operator: Exists}
      - {key: rke2-upgrade, operator: NotIn, values: ["disabled", "false"]}
      # When using k8s version 1.19 or older, swap control-plane with master
      - {key: node-role.kubernetes.io/control-plane, operator: NotIn, values: ["true"]}
  prepare:
    args:
    - prepare
    - server-plan
    image: rancher/rke2-upgrade
  serviceAccountName: system-upgrade
  tolerations:
    - key: node.longhorn.io/create-default-disk
      value: "true"
      operator: Equal
      effect: NoSchedule
  cordon: true
  drain:
    force: true
  upgrade:
    image: rancher/rke2-upgrade
  version: "$RKE2_VERSION"
EOF

# Enable/Start the upgrade process on all cluster nodes.
kubectl label nodes --all rke2-upgrade=true

Important

To make the upgrade happen RKE2 needs to be able to download various docker images, that's why enabling access to the public Internet during the upgrade procedure is recommended.

Component Upgrade

By default OneKE deploys Longhorn, Traefik, and MetalLB during cluster bootstrap. All these apps are deployed as Addons using RKE2's Helm Integration and official Helm charts. To illustrate the process let's upgrade Traefik Helm chart from the 10.23.0 to the 10.24.0 version according to these four basic steps:

  1. To avoid downtime make sure the number of worker nodes is at least 2 so 2 (anti-affined) Traefik replicas are running.
$ oneflow scale 'Service OneKE 1.24' worker 2
$ oneflow show 'Service OneKE 1.24'
...
LOG MESSAGES
06/30/22 21:32 [I] New state: DEPLOYING_NETS
06/30/22 21:32 [I] New state: DEPLOYING
06/30/22 21:39 [I] New state: RUNNING
06/30/22 21:54 [I] Role worker scaling up from 1 to 2 nodes
06/30/22 21:54 [I] New state: SCALING
06/30/22 21:56 [I] New state: COOLDOWN
06/30/22 22:01 [I] New state: RUNNING
$ kubectl -n traefik-system get pods
NAME                           READY   STATUS    RESTARTS   AGE
one-traefik-6768f7bdf4-cvqn2   1/1     Running   0          23m
one-traefik-6768f7bdf4-qqfcl   1/1     Running   0          23m
$ kubectl -n traefik-system get pods -o jsonpath='{range .items[*]}{.spec.containers[0].image}{"\n"}{end}'
traefik:2.7.1
traefik:2.7.1
  1. Update Helm repositories to be able to download Traefik Helm charts.
$ helm repo add traefik https://helm.traefik.io/traefik
"traefik" has been added to your repositories

$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "traefik" chart repository
Update Complete. ⎈Happy Helming!
  1. Pull the chart (version 10.24.0).
$ helm pull traefik/traefik --version '10.24.0'
  1. Patch the HelmChart/one-traefik CRD object.
$ kubectl -n kube-system patch helmchart/one-traefik --type merge --patch-file /dev/fd/0 <<EOF
{"spec": {"chartContent": "$(base64 -w0 < ./traefik-10.24.0.tgz)"}}
EOF
helmchart.helm.cattle.io/one-traefik patched
$ kubectl -n traefik-system get pods
NAME                           READY   STATUS    RESTARTS   AGE
one-traefik-7c5875d657-9v5h2   1/1     Running   0          88s
one-traefik-7c5875d657-bsp4v   1/1     Running   0          88s
$ kubectl -n traefik-system get pods -o jsonpath='{range .items[*]}{.spec.containers[0].image}{"\n"}{end}'
traefik:2.8.0
traefik:2.8.0

Important

To make the upgrade happen RKE2 needs to be able to download various docker images, that's why enabling access to the public Internet during the upgrade procedure is recommended.

Important

This was a very simple and quick Helm chart upgrade, but in general config changes in the spec.valuesContent field may also be required. Please plan your upgrades ahead!

Clone this wiki locally