Skip to content
This repository has been archived by the owner on Dec 8, 2023. It is now read-only.

since v0.10.0 pods (as part of deployments) are being killed every X minutes #432

Closed
kooskaspers opened this issue Apr 11, 2020 · 13 comments
Labels
kind/bug Something isn't working

Comments

@kooskaspers
Copy link
Contributor

kooskaspers commented Apr 11, 2020

Version (k3OS / kernel)
k3os version v0.10.0
5.0.0-43-generic #47~18.04.1 SMP Wed Apr 1 16:27:01 UTC 2020
Architecture
x86_64

Describe the bug
Since v.0.10. I'm experiencing pods being recreated every 'now and then'.
aprox every 15 minutes, I'm experiencing this behavior.

This is not applicable for pods part of daemonsets. ONLY pods part of depoyments (and of course replicasets) are being re-created:

see screenshot here:
image

first part are the pods of daemonsets, 2nd part are the pods of deployments/replicasets.

In one of my grafana dashboards, you can see this behavior pretty good. Have a look at the missing values every aprox 15 mins (it started around 23:15):

image

To Reproduce
just happens every aprox 15 mins

Expected behavior
stable deployments/replicasets. no pods being recreated every 15 minutes

Actual behavior
pods being recreated every 15 minutes

Additional context

@kooskaspers kooskaspers added the kind/bug Something isn't working label Apr 11, 2020
@kooskaspers
Copy link
Contributor Author

kooskaspers commented Apr 11, 2020

I have the feeling this happens because of the k3os system-upgrade-controller (image: 'rancher/system-upgrade-controller:v0.4.0')

Just scaled down the replicaset of the system-upgrade-controller deployment to 0 pods.
Let's see if this is going to solve the issue.

@kooskaspers
Copy link
Contributor Author

Jep, that seems to solve it!:

image

Another thing: About this section of the controller documentation:

Additionally, a value of disabled for this label on a node will cause the controller to skip over the node.

Well.. this is not behaving correctly to be honest.
I added the label by running:
kubectl label nodes -l k3os.io/mode plan.upgrade.cattle.io/k3os-latest=enabled --overwrite

And yes: I'm sure my plan's name is "k3os-latest".
But still, the upgrade controller is not respecting this label being set to disabled, it's still performing the upgrades to k3os, while I would expect it to skip the node.

@dweomer
Copy link
Contributor

dweomer commented Apr 11, 2020

Another thing: About this section of the controller documentation:

Additionally, a value of disabled for this label on a node will cause the controller to skip over the node.

Well.. this is not behaving correctly to be honest.
I added the label by running:
kubectl label nodes -l k3os.io/mode plan.upgrade.cattle.io/k3os-latest=enabled --overwrite

And yes: I'm sure my plan's name is "k3os-latest".
But still, the upgrade controller is not respecting this label being set to disabled, it's still performing the upgrades to k3os, while I would expect it to skip the node.

@kooskaspers I suspect this is a transcription error but plan.upgrade.cattle.io/k3os-latest=enabled is the exact opposite of what you want, I think. Did you mean plan.upgrade.cattle.io/k3os-latest=disabled?


Describe the bug
Since v.0.10. I'm experiencing pods being recreated every 'now and then'. aprox every 15 minutes, I'm experiencing this behavior.

This is not applicable for pods part of daemonsets. ONLY pods part of depoyments (and of course replicasets) are being re-created:

This tells me that the system-upgrade-controller still thinks that it has an upgrade to apply. To prevent application of the k3os-latest plan, label nodes with plan.upgrade.cattle.io/k3os-latest=disabled (as discussed above) or remove that label entirely (because the node selector on the plan requires it to exist).

The way the SUC (system-upgrade-controller) determines if it should apply a plan to a node is that if that node matches the selection criteria specified by the plan's .spec.nodeSelector AND that the value of the label plan.upgrade.cattle.io/${plan.name} != ${plan.status.latestHash} (on the node) provided it isn't the special value of disabled as discussed above.

When the SUC applies a plan to a node and the plan's spec.drain is non-nil, as with the provided k3os-latest plan, the pod will have a drain init container that invokes kubectl drain on the node. This is what is likely causing your outages. What I don't understand is why every 15 minutes? If the plan were failing to apply I wouldn't be surprised so see drains happened every 15 minutes or so (assuming the pod is timing out and never restarting before the job gets recreated) but from what I can see in your screenshots the plan appears to have been applied successfully. When a plan is applied successfully (aka the job completes without error), the SUC labels the node plan.upgrade.cattle.io/${plan.name}=${plan.status.latestHash} and marks the node as schedulable=true in the same operation. This lets the SUC know not to consider the node for the plan again until plan.upgrade.cattle.io/${plan.name} != ${plan.status.latestHash} on the node.

For the behavior that you have described I can imagine two possible causes:

  • a CronJob (or the like) is running every 15 minutes applying plan.upgrade.cattle.io/k3os-latest=enabled
  • the SUC is running with a service account that has had its permissions curtailed down from default cluster-admin cluster role binding (preventing it from updating the node)

@kooskaspers
Copy link
Contributor Author

Another thing: About this section of the controller documentation:

Additionally, a value of disabled for this label on a node will cause the controller to skip over the node.

Well.. this is not behaving correctly to be honest.
I added the label by running:
kubectl label nodes -l k3os.io/mode plan.upgrade.cattle.io/k3os-latest=enabled --overwrite
And yes: I'm sure my plan's name is "k3os-latest".
But still, the upgrade controller is not respecting this label being set to disabled, it's still performing the upgrades to k3os, while I would expect it to skip the node.

@kooskaspers I suspect this is a transcription error but plan.upgrade.cattle.io/k3os-latest=enabled is the exact opposite of what you want, I think. Did you mean plan.upgrade.cattle.io/k3os-latest=disabled?

that's a typo indeed ;), I meant 'disabled' for sure.

Describe the bug
Since v.0.10. I'm experiencing pods being recreated every 'now and then'. aprox every 15 minutes, I'm experiencing this behavior.
This is not applicable for pods part of daemonsets. ONLY pods part of depoyments (and of course replicasets) are being re-created:

This tells me that the system-upgrade-controller still thinks that it has an upgrade to apply. To prevent application of the k3os-latest plan, label nodes with plan.upgrade.cattle.io/k3os-latest=disabled (as discussed above) or remove that label entirely (because the node selector on the plan requires it to exist).

The way the SUC (system-upgrade-controller) determines if it should apply a plan to a node is that if that node matches the selection criteria specified by the plan's .spec.nodeSelector AND that the value of the label plan.upgrade.cattle.io/${plan.name} != ${plan.status.latestHash} (on the node) provided it isn't the special value of disabled as discussed above.

When the SUC applies a plan to a node and the plan's spec.drain is non-nil, as with the provided k3os-latest plan, the pod will have a drain init container that invokes kubectl drain on the node. This is what is likely causing your outages. What I don't understand is why every 15 minutes? If the plan were failing to apply I wouldn't be surprised so see drains happened every 15 minutes or so (assuming the pod is timing out and never restarting before the job gets recreated) but from what I can see in your screenshots the plan appears to have been applied successfully. When a plan is applied successfully (aka the job completes without error), the SUC labels the node plan.upgrade.cattle.io/${plan.name}=${plan.status.latestHash} and marks the node as schedulable=true in the same operation. This lets the SUC know not to consider the node for the plan again until plan.upgrade.cattle.io/${plan.name} != ${plan.status.latestHash} on the node.

Just had a look at that label you're mentioning. It says:
plan.upgrade.cattle.io/k3os-latest: 539c669e033d2412b0a998e19d789809e8cab1f1dc650802cd4548c3

the last time the system-upgrade-controller ran it's pod, the output was:

time="2020-04-11T15:56:09Z" level=info msg="skipping "k3os" because destination version matches source: v0.10.0"
time="2020-04-11T15:56:09Z" level=info msg="skipping "k3s" because destination version matches source: v1.17.4+k3s1"
time="2020-04-11T15:56:09Z" level=info msg="skipping "kernel" because destination version matches source: 5.0.0-43-generic"

For the behavior that you have described I can imagine two possible causes:

* a CronJob (or the like) is running every 15 minutes applying `plan.upgrade.cattle.io/k3os-latest=enabled`

I got only one cronjob scheduled:
kubectl get cronjob -A

NAMESPACE   NAME                                   SCHEDULE    SUSPEND   ACTIVE   LAST SCHEDULE   AGE
backup      backup-k3s-storage-to-storage-server   0 2 * * *   False     0        19h             22d

So to my understanding, it can't be a Cronjob running something every 15 mins. And my single node cluster seems to be updated fine. This conclusion is based on the log I mentioned above, plus:

k3os --version
telling me I'm running v0.10.0
and
uname --kernel-release --kernel-version
telling me I'm running the latest kernel release:

5.0.0-43-generic #47~18.04.1 SMP Wed Apr 1 16:27:01 UTC 2020

So the big question is: why does the SUC still thinks it has to perform upgrades?

* the SUC is running with a service account that has had its permissions curtailed down from default `cluster-admin` cluster role binding (preventing it from updating the node)

How can I check if that's the case?

@dweomer
Copy link
Contributor

dweomer commented Apr 12, 2020

Your kubectl describe clusterrolebinding/system-upgrade output should look like this (specifically, the cluster-admin Cluster Role):

k3os-4544 [~]$ kubectl describe clusterrolebinding/system-upgrade
Name:         system-upgrade
Labels:       objectset.rio.cattle.io/hash=1c0d4801041347dca2f1efe1984b515268f8cc6f
Annotations:  objectset.rio.cattle.io/applied:
                {"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRoleBinding","metadata":{"annotations":{"objectset.rio.cattle.io/id":"","objec...
              objectset.rio.cattle.io/id: 
              objectset.rio.cattle.io/owner-gvk: k3s.cattle.io/v1, Kind=Addon
              objectset.rio.cattle.io/owner-name: system-upgrade-controller
              objectset.rio.cattle.io/owner-namespace: kube-system
Role:
  Kind:  ClusterRole
  Name:  cluster-admin
Subjects:
  Kind            Name          Namespace
  ----            ----          ---------
  ServiceAccount  k3os-upgrade  k3os-system
k3os-4544 [~]$ 

If such is the case, I do not know what is causing the behavior that you are describing. If you scale up the SUC deployment/replicaset back to 1 does the apply-k3os-latest-* job get created every 15 minutes? If it does not, and you are still experiencing the outages then there is some other interaction going on. Regardless, if you could get me the output for the below items, it might be helpful:

  • kubectl get -o yaml -n k3os-system plan/k3os-latest # plan
  • kubectl get -o yaml -n k3os-system pod -l 'controller-uid' # deployment pod
  • kubectl logs -n k3os-system -l controller-uid --tail=-1 #deployment logs (only useful after an upgrade job has been applied)
  • kubectl get -o yaml -n k3os-system job # plan job
  • kubectl get -o yaml -n k3os-system pod -l 'job-name' # plan pod
  • kubectl get node -o json | jq -r .items[].metadata.labels # node labels

@dweomer
Copy link
Contributor

dweomer commented Apr 12, 2020

Just had a look at that label you're mentioning. It says:
plan.upgrade.cattle.io/k3os-latest: 539c669e033d2412b0a998e19d789809e8cab1f1dc650802cd4548c3

Does this match the output of kubectl get -n k3os-system plan -o json | jq -r .items[].status.latestHash?

@kooskaspers
Copy link
Contributor Author

kooskaspers commented Apr 13, 2020

@dweomer here we go:

kubectl describe clusterrolebinding/system-upgrade

Name:         system-upgrade
Labels:       objectset.rio.cattle.io/hash=1c0d4801041347dca2f1efe1984b515268f8cc6f
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRoleBinding","metadata":{"annotations":{},"name":"system-upgrade"},"roleRef":{...
              objectset.rio.cattle.io/applied:
                {"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRoleBinding","metadata":{"annotations":{"objectset.rio.cattle.io/id":"","objec...
              objectset.rio.cattle.io/id: 
              objectset.rio.cattle.io/owner-gvk: k3s.cattle.io/v1, Kind=Addon
              objectset.rio.cattle.io/owner-name: system-upgrade-controller
              objectset.rio.cattle.io/owner-namespace: kube-system
Role:
  Kind:  ClusterRole
  Name:  cluster-admin
Subjects:
  Kind            Name          Namespace
  ----            ----          ---------
  ServiceAccount  k3os-upgrade  k3os-system
kubectl get -o yaml -n k3os-system plan/k3os-latest # plan


apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"upgrade.cattle.io/v1","kind":"Plan","metadata":{"annotations":{},"name":"k3os-latest","namespace":"k3os-system"},"spec":{"channel":"https://github.com/rancher/k3os/releases/latest","concurrency":1,"drain":{"force":true},"nodeSelector":{"matchExpressions":[{"key":"plan.upgrade.cattle.io/k3os-latest","operator":"Exists"},{"key":"k3os.io/mode","operator":"Exists"},{"key":"k3os.io/mode","operator":"NotIn","values":["live"]}]},"serviceAccountName":"k3os-upgrade","upgrade":{"args":["upgrade","--kernel","--rootfs","--remount","--sync","--reboot","--lock-file=/host/run/k3os/upgrade.lock","--source=/k3os/system","--destination=/host/k3os/system"],"command":["k3os","--debug"],"image":"rancher/k3os"}}}
    objectset.rio.cattle.io/applied: '{"apiVersion":"upgrade.cattle.io/v1","kind":"Plan","metadata":{"annotations":{"objectset.rio.cattle.io/id":"","objectset.rio.cattle.io/owner-gvk":"k3s.cattle.io/v1,
      Kind=Addon","objectset.rio.cattle.io/owner-name":"k3os-latest","objectset.rio.cattle.io/owner-namespace":"kube-system"},"labels":{"objectset.rio.cattle.io/hash":"00af57bd5ee6dfcb90546f5b74df4eea44e2d7ec"},"name":"k3os-latest","namespace":"k3os-system"},"spec":{"channel":"https://github.com/rancher/k3os/releases/latest","concurrency":1,"drain":{"force":true},"nodeSelector":{"matchExpressions":[{"key":"plan.upgrade.cattle.io/k3os-latest","operator":"Exists"},{"key":"k3os.io/mode","operator":"Exists"},{"key":"k3os.io/mode","operator":"NotIn","values":["live"]}]},"serviceAccountName":"k3os-upgrade","upgrade":{"args":["upgrade","--kernel","--rootfs","--remount","--sync","--reboot","--lock-file=/host/run/k3os/upgrade.lock","--source=/k3os/system","--destination=/host/k3os/system"],"command":["k3os","--debug"],"image":"rancher/k3os"}}}'
    objectset.rio.cattle.io/id: ""
    objectset.rio.cattle.io/owner-gvk: k3s.cattle.io/v1, Kind=Addon
    objectset.rio.cattle.io/owner-name: k3os-latest
    objectset.rio.cattle.io/owner-namespace: kube-system
  creationTimestamp: "2020-02-19T19:30:14Z"
  generation: 7
  labels:
    objectset.rio.cattle.io/hash: 00af57bd5ee6dfcb90546f5b74df4eea44e2d7ec
  name: k3os-latest
  namespace: k3os-system
  resourceVersion: "13896941"
  selfLink: /apis/upgrade.cattle.io/v1/namespaces/k3os-system/plans/k3os-latest
  uid: cd41e931-5536-44f3-9847-f303ff752c14
spec:
  channel: https://github.com/rancher/k3os/releases/latest
  concurrency: 1
  drain:
    force: true
  nodeSelector:
    matchExpressions:
    - key: plan.upgrade.cattle.io/k3os-latest
      operator: Exists
    - key: k3os.io/mode
      operator: Exists
    - key: k3os.io/mode
      operator: NotIn
      values:
      - live
  serviceAccountName: k3os-upgrade
  upgrade:
    args:
    - upgrade
    - --kernel
    - --rootfs
    - --remount
    - --sync
    - --reboot
    - --lock-file=/host/run/k3os/upgrade.lock
    - --source=/k3os/system
    - --destination=/host/k3os/system
    command:
    - k3os
    - --debug
    image: rancher/k3os
status:
  conditions:
  - lastUpdateTime: "2020-04-11T15:51:40Z"
    reason: Channel
    status: "True"
    type: LatestResolved
  latestHash: 539c669e033d2412b0a998e19d789809e8cab1f1dc650802cd4548c3
  latestVersion: v0.10.0
kubectl get -o yaml -n k3os-system pod -l 'controller-uid' # deployment pod                                                      
apiVersion: v1
items:
- apiVersion: v1
  kind: Pod
  metadata:
    creationTimestamp: "2020-04-11T15:55:03Z"
    generateName: apply-k3os-latest-on-kubernetes-with-539c669e033d2412b0a9-52a3d-
    labels:
      controller-uid: 324c11c9-cdd7-479b-9da0-7280252dc610
      job-name: apply-k3os-latest-on-kubernetes-with-539c669e033d2412b0a9-52a3d
      plan.upgrade.cattle.io/k3os-latest: 539c669e033d2412b0a998e19d789809e8cab1f1dc650802cd4548c3
      upgrade.cattle.io/controller: system-upgrade-controller
      upgrade.cattle.io/node: kubernetes
      upgrade.cattle.io/plan: k3os-latest
      upgrade.cattle.io/version: v0.10.0
    name: apply-k3os-latest-on-kubernetes-with-539c669e033d2412b0a9-5l8rf
    namespace: k3os-system
    ownerReferences:
    - apiVersion: batch/v1
      blockOwnerDeletion: true
      controller: true
      kind: Job
      name: apply-k3os-latest-on-kubernetes-with-539c669e033d2412b0a9-52a3d
      uid: 324c11c9-cdd7-479b-9da0-7280252dc610
    resourceVersion: "13896937"
    selfLink: /api/v1/namespaces/k3os-system/pods/apply-k3os-latest-on-kubernetes-with-539c669e033d2412b0a9-5l8rf
    uid: f2d07e54-0e42-4013-ad7b-f6bea2857c21
  spec:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
              - kubernetes
      podAntiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
            - key: upgrade.cattle.io/plan
              operator: In
              values:
              - k3os-latest
          topologyKey: kubernetes.io/hostname
    containers:
    - args:
      - upgrade
      - --kernel
      - --rootfs
      - --remount
      - --sync
      - --reboot
      - --lock-file=/host/run/k3os/upgrade.lock
      - --source=/k3os/system
      - --destination=/host/k3os/system
      command:
      - k3os
      - --debug
      env:
      - name: SYSTEM_UPGRADE_NODE_NAME
        valueFrom:
          fieldRef:
            apiVersion: v1
            fieldPath: spec.nodeName
      - name: SYSTEM_UPGRADE_POD_NAME
        valueFrom:
          fieldRef:
            apiVersion: v1
            fieldPath: metadata.name
      - name: SYSTEM_UPGRADE_POD_UID
        valueFrom:
          fieldRef:
            apiVersion: v1
            fieldPath: metadata.uid
      - name: SYSTEM_UPGRADE_PLAN_NAME
        value: k3os-latest
      - name: SYSTEM_UPGRADE_PLAN_LATEST_HASH
        value: 539c669e033d2412b0a998e19d789809e8cab1f1dc650802cd4548c3
      - name: SYSTEM_UPGRADE_PLAN_LATEST_VERSION
        value: v0.10.0
      image: rancher/k3os:v0.10.0
      imagePullPolicy: IfNotPresent
      name: upgrade
      resources: {}
      securityContext:
        capabilities:
          add:
          - CAP_SYS_BOOT
        privileged: true
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
      - mountPath: /host
        name: host-root
      - mountPath: /run/system-upgrade/pod
        name: pod-info
        readOnly: true
      - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
        name: k3os-upgrade-token-lbqfc
        readOnly: true
    dnsPolicy: ClusterFirst
    enableServiceLinks: true
    hostIPC: true
    hostNetwork: true
    hostPID: true
    initContainers:
    - args:
      - drain
      - kubernetes
      - --pod-selector
      - '!upgrade.cattle.io/controller'
      - --ignore-daemonsets
      - --delete-local-data
      - --force
      env:
      - name: SYSTEM_UPGRADE_NODE_NAME
        valueFrom:
          fieldRef:
            apiVersion: v1
            fieldPath: spec.nodeName
      - name: SYSTEM_UPGRADE_POD_NAME
        valueFrom:
          fieldRef:
            apiVersion: v1
            fieldPath: metadata.name
      - name: SYSTEM_UPGRADE_POD_UID
        valueFrom:
          fieldRef:
            apiVersion: v1
            fieldPath: metadata.uid
      - name: SYSTEM_UPGRADE_PLAN_NAME
        value: k3os-latest
      - name: SYSTEM_UPGRADE_PLAN_LATEST_HASH
        value: 539c669e033d2412b0a998e19d789809e8cab1f1dc650802cd4548c3
      - name: SYSTEM_UPGRADE_PLAN_LATEST_VERSION
        value: v0.10.0
      image: rancher/kubectl:v1.17.0
      imagePullPolicy: IfNotPresent
      name: drain
      resources: {}
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
      - mountPath: /host
        name: host-root
      - mountPath: /run/system-upgrade/pod
        name: pod-info
        readOnly: true
      - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
        name: k3os-upgrade-token-lbqfc
        readOnly: true
    nodeName: kubernetes
    priority: 0
    restartPolicy: Never
    schedulerName: default-scheduler
    securityContext: {}
    serviceAccount: k3os-upgrade
    serviceAccountName: k3os-upgrade
    terminationGracePeriodSeconds: 30
    tolerations:
    - effect: NoSchedule
      key: node.kubernetes.io/unschedulable
      operator: Exists
    - effect: NoExecute
      key: node.kubernetes.io/not-ready
      operator: Exists
      tolerationSeconds: 300
    - effect: NoExecute
      key: node.kubernetes.io/unreachable
      operator: Exists
      tolerationSeconds: 300
    volumes:
    - hostPath:
        path: /
        type: Directory
      name: host-root
    - downwardAPI:
        defaultMode: 420
        items:
        - fieldRef:
            apiVersion: v1
            fieldPath: metadata.labels
          path: labels
        - fieldRef:
            apiVersion: v1
            fieldPath: metadata.annotations
          path: annotations
      name: pod-info
    - name: k3os-upgrade-token-lbqfc
      secret:
        defaultMode: 420
        secretName: k3os-upgrade-token-lbqfc
  status:
    conditions:
    - lastProbeTime: null
      lastTransitionTime: "2020-04-11T15:56:09Z"
      reason: PodCompleted
      status: "True"
      type: Initialized
    - lastProbeTime: null
      lastTransitionTime: "2020-04-11T15:55:03Z"
      reason: PodCompleted
      status: "False"
      type: Ready
    - lastProbeTime: null
      lastTransitionTime: "2020-04-11T15:55:03Z"
      reason: PodCompleted
      status: "False"
      type: ContainersReady
    - lastProbeTime: null
      lastTransitionTime: "2020-04-11T15:55:03Z"
      status: "True"
      type: PodScheduled
    containerStatuses:
    - containerID: containerd://ece8a3da91eb60c65af5058ef9902125b0a522bb704ace225445778d7c066702
      image: docker.io/rancher/k3os:v0.10.0
      imageID: docker.io/rancher/k3os@sha256:ed256bd8e5d127e37afd4e709b64d020fa12220107e3168957c3c8ba8637eaf9
      lastState: {}
      name: upgrade
      ready: false
      restartCount: 0
      started: false
      state:
        terminated:
          containerID: containerd://ece8a3da91eb60c65af5058ef9902125b0a522bb704ace225445778d7c066702
          exitCode: 0
          finishedAt: "2020-04-11T15:56:09Z"
          reason: Completed
          startedAt: "2020-04-11T15:56:09Z"
    hostIP: 192.168.1.51
    initContainerStatuses:
    - containerID: containerd://73c3ff95a8d774071c205de14ffaa075e382aac111118702b20ae2db5b749e86
      image: docker.io/rancher/kubectl:v1.17.0
      imageID: docker.io/rancher/kubectl@sha256:8a93d3b0386bf17eacea25dcd30356ebfea93ea7adfcb8c7ac6552654b4a2b4f
      lastState: {}
      name: drain
      ready: true
      restartCount: 0
      state:
        terminated:
          containerID: containerd://73c3ff95a8d774071c205de14ffaa075e382aac111118702b20ae2db5b749e86
          exitCode: 0
          finishedAt: "2020-04-11T15:56:09Z"
          reason: Completed
          startedAt: "2020-04-11T15:55:03Z"
    phase: Succeeded
    podIP: 192.168.1.51
    podIPs:
    - ip: 192.168.1.51
    qosClass: BestEffort
    startTime: "2020-04-11T15:55:03Z"
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
kubectl logs -n k3os-system -l controller-uid --tail=-1 #deployment logs

time="2020-04-11T15:56:09Z" level=info msg="skipping \"k3os\" because destination version matches source: v0.10.0"
time="2020-04-11T15:56:09Z" level=info msg="skipping \"k3s\" because destination version matches source: v1.17.4+k3s1"
time="2020-04-11T15:56:09Z" level=info msg="skipping \"kernel\" because destination version matches source: 5.0.0-43-generic"
kubectl get -o yaml -n k3os-system job # plan job           
                                                                     
apiVersion: v1
items:
- apiVersion: batch/v1
  kind: Job
  metadata:
    annotations:
      objectset.rio.cattle.io/applied: H4sIAAAAAAAA/+xXb2/bthP+Kr/fvZb8N3ZiAX3h1d5qtHWMOh1QFEFAkSebM0VqJOXECPzdh6NkV0mcNN1eDSsCxKLIe+7u4T3k6R5y9EwwzyC5B81yhATKYmWZwHhTpmg1enTxrfTreNM3LlbMo/Mx8/GQj7p9MRB93u90hyM27McDNhRnEAUgVzBOaMHK7ZzHHCLgFpmXRl/JHJ1neQGJLpWKQLEUlaMoTPoHcu/Qt6w0Lc68V9iSpr1mbg0JdPudsyEX572R6KRpL+1e9C66WS9l/GyAvT7rpIP+cCQ6EEGhmG7V2TSAGnlAAk/TGGTdi3PsdQfi7FxkQ+ylgxHLhiw9GwwvEC9GWS8bDCGCp9DcaG+NUmghgSrp+EBnY+6UqTYi0HUk/eQqSulAap3CPgKmtfGB1xcZlOI7UT1naG412ni13Xyrjsbsthv9773U4s2CgvseSl1kzQxeY3KqnPb7CFyBnHJm3MstTpAJJTUukRstHCSjTieClPGNybIPMpcekl4EHvOCfJNhUwCvqM7/SE01mN0aVeboIPl6PCHWxvnYGkNbR88L5teBnPALbYjA7wpaOZEWuTd2R5BH+8KIWOrMQATC3OpbZsV4MSME6TGvXNVYNfMRZBKV+IQZrQrPldPj/rXqlcFPbdzUxSsQmsv3++v9/joCqaV/a7RnUqN9QIKwTFK5y5ytaGyZ5mu0bWKbe5Vsu63ueYvOIWZXZHm0eLAfcUxsOFSBJ4jg/y+WABnIlTYWY8EwN9qhr2AEKvQYK8OZikNF09vMWI5wHQHqbTP65Zfl1fTjzefFb5/Gk+nN/JL+jT9OIYItUyX+ak1+JOoEaVQdLaqvOeHtm7v7CHpxOfkh5ON26Ncgf55Nfhi4lOJl3A/j+YOQn5y4L1p+GF9Nl1c378bLdw2Av30mvM7b79NPy9nlvOFw22mNWh2gIrboTGk5ifh+H9WK/mhK7Z+VdU6zNXFtmoDT+rXIxKVWO0i8LfGRnS11++GJ1S6MCCEF1SxKpRZGSb6DBGbZ3PiFRYfahyX8pOxqoFPC6xuX1FmTdZ4zLUh3NFFLJC1XJIaDIr+BxfGGNKnCI3GQVSYWQ0bh2e00r1+mFUsxyW0TZ1Lhm8BSSJn8tQ8qpgWVddiCN9XssSGioJyXOhw7NUZzxU/l/lTuv0q5ETjkpZV+R9cm3vnQWLGCpVJJL0MmwEQQ5tvx4mb5ZXnzy+XlFVzvIyis3EqFKxRVVPuaA8+sP7qb4zZchA7tVnIcc06Rzxt95TdZh+5kNjnkSMPZ4u1hyLKM7vdd+PgxAseNscU/S2lRTEor9WrJ1yhKJfVqFi7f+vX0DnlJyj0gLOt7/Apt3cfkzPP19K6w6FzVoH+9hw3uHvRl4fvGOB92KQJToGXUDSQw04fCCAdWo3W43tMfkWbEWHv5T4L/el81WofwQ2P8fODP9JAvBt5QIcUdgTeFUWa1e/8SF1WG3ijCfRQF8d16aFdqV+XKUvWIyOmddKFTwixDTl363NTEVG6o5fXMl0Fq+78CAAD//+Vd+iIaDwAA
      objectset.rio.cattle.io/id: system-upgrade-controller
      objectset.rio.cattle.io/owner-gvk: upgrade.cattle.io/v1, Kind=Plan
      objectset.rio.cattle.io/owner-name: k3os-latest
      objectset.rio.cattle.io/owner-namespace: k3os-system
    creationTimestamp: "2020-02-19T19:32:08Z"
    labels:
      objectset.rio.cattle.io/hash: 13046cd729d0bb2b18281f2bac45e23a0b5369d0
      plan.upgrade.cattle.io/k3os-latest: 6c913d5d3c30169a635f187e215d47df6e2b59af6ab4568ee89f2f56
      upgrade.cattle.io/controller: system-upgrade-controller
      upgrade.cattle.io/node: kubernetes
      upgrade.cattle.io/plan: k3os-latest
    name: upgrade-kubernetes-with-k3os-latest-at-6c913d5d3c30169a63-5a6d4
    namespace: k3os-system
    resourceVersion: "5292402"
    selfLink: /apis/batch/v1/namespaces/k3os-system/jobs/upgrade-kubernetes-with-k3os-latest-at-6c913d5d3c30169a63-5a6d4
    uid: 00ddc1c6-ea31-407a-bf76-bc98e4d13135
  spec:
    activeDeadlineSeconds: 900
    backoffLimit: 2
    completions: 1
    parallelism: 1
    selector:
      matchLabels:
        controller-uid: 00ddc1c6-ea31-407a-bf76-bc98e4d13135
    template:
      metadata:
        creationTimestamp: null
        labels:
          controller-uid: 00ddc1c6-ea31-407a-bf76-bc98e4d13135
          job-name: upgrade-kubernetes-with-k3os-latest-at-6c913d5d3c30169a63-5a6d4
          plan.upgrade.cattle.io/k3os-latest: 6c913d5d3c30169a635f187e215d47df6e2b59af6ab4568ee89f2f56
          upgrade.cattle.io/controller: system-upgrade-controller
          upgrade.cattle.io/node: kubernetes
          upgrade.cattle.io/plan: k3os-latest
      spec:
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: kubernetes.io/hostname
                  operator: In
                  values:
                  - kubernetes
          podAntiAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                - key: upgrade.cattle.io/plan
                  operator: In
                  values:
                  - k3os-latest
              topologyKey: kubernetes.io/hostname
        containers:
        - args:
          - upgrade
          - --kernel
          - --rootfs
          - --remount
          - --sync
          - --reboot
          - --lock-file=/host/run/k3os/upgrade.lock
          - --source=/k3os/system
          - --destination=/host/k3os/system
          command:
          - k3os
          - --debug
          env:
          - name: SYSTEM_UPGRADE_NODE_NAME
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: spec.nodeName
          - name: SYSTEM_UPGRADE_POD_NAME
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.name
          - name: SYSTEM_UPGRADE_POD_UID
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.uid
          - name: SYSTEM_UPGRADE_PLAN_NAME
            value: k3os-latest
          - name: SYSTEM_UPGRADE_PLAN_LATEST_HASH
            value: 6c913d5d3c30169a635f187e215d47df6e2b59af6ab4568ee89f2f56
          - name: SYSTEM_UPGRADE_PLAN_LATEST_VERSION
            value: v0.9.0
          image: rancher/k3os:v0.9.0
          imagePullPolicy: IfNotPresent
          name: upgrade
          resources: {}
          securityContext:
            capabilities:
              add:
              - CAP_SYS_BOOT
            privileged: true
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /host
            name: host-root
          - mountPath: /run/system-upgrade/pod
            name: pod-info
            readOnly: true
        dnsPolicy: ClusterFirst
        hostIPC: true
        hostPID: true
        initContainers:
        - args:
          - drain
          - kubernetes
          - --pod-selector
          - '!upgrade.cattle.io/controller'
          - --ignore-daemonsets
          - --delete-local-data
          - --force
          env:
          - name: SYSTEM_UPGRADE_NODE_NAME
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: spec.nodeName
          - name: SYSTEM_UPGRADE_POD_NAME
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.name
          - name: SYSTEM_UPGRADE_POD_UID
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.uid
          - name: SYSTEM_UPGRADE_PLAN_NAME
            value: k3os-latest
          - name: SYSTEM_UPGRADE_PLAN_LATEST_HASH
            value: 6c913d5d3c30169a635f187e215d47df6e2b59af6ab4568ee89f2f56
          - name: SYSTEM_UPGRADE_PLAN_LATEST_VERSION
            value: v0.9.0
          image: rancher/kubectl:v1.17.0
          imagePullPolicy: IfNotPresent
          name: drain
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /host
            name: host-root
          - mountPath: /run/system-upgrade/pod
            name: pod-info
            readOnly: true
        restartPolicy: Never
        schedulerName: default-scheduler
        securityContext: {}
        serviceAccount: k3os-upgrade
        serviceAccountName: k3os-upgrade
        terminationGracePeriodSeconds: 30
        tolerations:
        - effect: NoSchedule
          key: node.kubernetes.io/unschedulable
          operator: Exists
        volumes:
        - hostPath:
            path: /
            type: Directory
          name: host-root
        - downwardAPI:
            defaultMode: 420
            items:
            - fieldRef:
                apiVersion: v1
                fieldPath: metadata.labels
              path: labels
            - fieldRef:
                apiVersion: v1
                fieldPath: metadata.annotations
              path: annotations
          name: pod-info
  status:
    completionTime: "2020-02-19T19:37:44Z"
    conditions:
    - lastProbeTime: "2020-02-19T19:37:44Z"
      lastTransitionTime: "2020-02-19T19:37:44Z"
      status: "True"
      type: Complete
    startTime: "2020-02-19T19:32:08Z"
    succeeded: 1
- apiVersion: batch/v1
  kind: Job
  metadata:
    annotations:
      objectset.rio.cattle.io/applied: H4sIAAAAAAAA/+xX3W7bOBN9le+ba8n/qWMBvfDW3q3R1jHqdIGiCAKKHNlcU6SWpJwYgd99MZTsylnnp7tXCxQFGkmcOTNzOIccP0COngnmGSQPoFmOkAArCrWLN33jYsU8Oh8bHW/KFK1Gjy6+k34d9waj0eBi2BniYNhlvNdFnsZdMbwUEAUgVzBOaAHH7ZzHHCLgFpmXRl/LHJ1neQGJLpWKQLEUlaMsTPoHcu/Qt6w0Lc68V9iSpr1mbg0JdPudwRsuhr2R6KRpL+1e9i67WS9lfHCBvT7rpBf9NyPRgQgKxXSrLFaWCWwANSqDBM4VcpkR9mWvd8H6aR87Q9HtdQbpQAwF7+Kwl42Qyvw7NDfaW6MUWkigKjqureLG2jlXbUSg68jzWSsq6UBqXcI5sy1aJw1ZbjutUasL+wiY1sYH8p+lWYoXUn/K0dxptPFqu4HkXEbd6H8fpBZvF1TBSyh1J56W+bLLuZ7b7yNwBXKqmXEvtzhBJpTUuERutHCQjDqdCFLGNybLPspcekh6EXjMC4pNjk2VvKKFfzbe98Zr0L81qszRQfLteNasjfOxNYbQ6HnB/DowGP5CGyLwu4IsJ9Ii98buCPLoXxgRS50ZiECYO33HrBgvZoQgPeZVqBqr3p4IMolKfMaMrMJzFfS4ya3aMsSpnZvieQVC03y/v9nvbyKQWvp3RnsmNdoTEoRlkjQhc7aid8s0X6Nt05Zwr5Jtt9UdtuhEY3ZFnkePk02LY2LDoQo8QQT/f7ZPyEGutLEYC4a50Q59BSNQocdYGc5UHNqevmbGcoSbCFBvm9kvvy6vp59uvyx++zyeTG/nV/Tf+NMUItgyVeKv1uRHos6QRt3RoiacE96+ubuPoBdXkx9CPm6Hfg3yl9nkh4FLKZ7H/Tien6T8SEIveH4cX0+X17fvx8v3DYB/fHC8Ltrv08/L2dW8EfAg5ZsILDpTWk4ifthHtaI/mVL7J2Wd02pNXJsW4Lx+LTJxpdUOEm9LfORnS90+PdbahREhpaCaRanUwijJd5DALJsbv7DoUPtgws/KrgY6J7y+cUldNXnnOdOCdEcLtUTSckViOCjyO1gcb0iTKjwSB1nlYjFUFJ7dTvP6Y1qxFJPcNnEmFb4NLIWSKV77oGIyqLzDFrytVo+jFSXlvNTh2KkxmhY/lftTuf8p5UbgkJdW+h1dm3jvw/TFCpZKJb0MlQATQZjvxovb5dfl7S9XV9dws4+gsHIrFa5QVFntaw48s/4Ybo7bcBE6tFvJccw5ZT5vDJ/fZU31z9HfGbs51BkGltmk+TpbvDu8siyjK38XflkZgePGu8U/S2lRTEor9WrJ1yhKJfVqFu7j+vP0HnnpwzRVISzrq/0abT3a5Mzz9fS+sOhcNdh/e4AN7k7mufDjyTgfNi4CU6BlNCAkMNOHXglnWGOauNnTP+LRiLH28t8k/+2hmr0O6YeB+unEn5g9n028IUzKOwJvCqPMavfhOS6qCr1RhPsoC+K7depXalfVylL1iMjpvXRheMIsQ07T/dzUxFRhaAr2zJdBffu/AgAA//+Lg8nddw8AAA
      objectset.rio.cattle.io/id: system-upgrade-controller
      objectset.rio.cattle.io/owner-gvk: upgrade.cattle.io/v1, Kind=Plan
      objectset.rio.cattle.io/owner-name: k3os-latest
      objectset.rio.cattle.io/owner-namespace: k3os-system
    creationTimestamp: "2020-03-11T16:38:45Z"
    labels:
      objectset.rio.cattle.io/hash: 13046cd729d0bb2b18281f2bac45e23a0b5369d0
      plan.upgrade.cattle.io/k3os-latest: 249945707e471ac21ecb8fd0bb8225a3b3e07d1204b4d7dc1e72f9ed
      upgrade.cattle.io/controller: system-upgrade-controller
      upgrade.cattle.io/node: kubernetes
      upgrade.cattle.io/plan: k3os-latest
      upgrade.cattle.io/version: v0.9.1
    name: apply-k3os-latest-on-kubernetes-with-249945707e471ac21ecb-1d78d
    namespace: k3os-system
    resourceVersion: "8727732"
    selfLink: /apis/batch/v1/namespaces/k3os-system/jobs/apply-k3os-latest-on-kubernetes-with-249945707e471ac21ecb-1d78d
    uid: 5400185a-3e12-43b2-aa1b-d709e2829596
  spec:
    activeDeadlineSeconds: 900
    backoffLimit: 2
    completions: 1
    parallelism: 1
    selector:
      matchLabels:
        controller-uid: 5400185a-3e12-43b2-aa1b-d709e2829596
    template:
      metadata:
        creationTimestamp: null
        labels:
          controller-uid: 5400185a-3e12-43b2-aa1b-d709e2829596
          job-name: apply-k3os-latest-on-kubernetes-with-249945707e471ac21ecb-1d78d
          plan.upgrade.cattle.io/k3os-latest: 249945707e471ac21ecb8fd0bb8225a3b3e07d1204b4d7dc1e72f9ed
          upgrade.cattle.io/controller: system-upgrade-controller
          upgrade.cattle.io/node: kubernetes
          upgrade.cattle.io/plan: k3os-latest
          upgrade.cattle.io/version: v0.9.1
      spec:
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: kubernetes.io/hostname
                  operator: In
                  values:
                  - kubernetes
          podAntiAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                - key: upgrade.cattle.io/plan
                  operator: In
                  values:
                  - k3os-latest
              topologyKey: kubernetes.io/hostname
        containers:
        - args:
          - upgrade
          - --kernel
          - --rootfs
          - --remount
          - --sync
          - --reboot
          - --lock-file=/host/run/k3os/upgrade.lock
          - --source=/k3os/system
          - --destination=/host/k3os/system
          command:
          - k3os
          - --debug
          env:
          - name: SYSTEM_UPGRADE_NODE_NAME
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: spec.nodeName
          - name: SYSTEM_UPGRADE_POD_NAME
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.name
          - name: SYSTEM_UPGRADE_POD_UID
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.uid
          - name: SYSTEM_UPGRADE_PLAN_NAME
            value: k3os-latest
          - name: SYSTEM_UPGRADE_PLAN_LATEST_HASH
            value: 249945707e471ac21ecb8fd0bb8225a3b3e07d1204b4d7dc1e72f9ed
          - name: SYSTEM_UPGRADE_PLAN_LATEST_VERSION
            value: v0.9.1
          image: rancher/k3os:v0.9.1
          imagePullPolicy: IfNotPresent
          name: upgrade
          resources: {}
          securityContext:
            capabilities:
              add:
              - CAP_SYS_BOOT
            privileged: true
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /host
            name: host-root
          - mountPath: /run/system-upgrade/pod
            name: pod-info
            readOnly: true
        dnsPolicy: ClusterFirst
        hostIPC: true
        hostNetwork: true
        hostPID: true
        initContainers:
        - args:
          - drain
          - kubernetes
          - --pod-selector
          - '!upgrade.cattle.io/controller'
          - --ignore-daemonsets
          - --delete-local-data
          - --force
          env:
          - name: SYSTEM_UPGRADE_NODE_NAME
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: spec.nodeName
          - name: SYSTEM_UPGRADE_POD_NAME
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.name
          - name: SYSTEM_UPGRADE_POD_UID
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.uid
          - name: SYSTEM_UPGRADE_PLAN_NAME
            value: k3os-latest
          - name: SYSTEM_UPGRADE_PLAN_LATEST_HASH
            value: 249945707e471ac21ecb8fd0bb8225a3b3e07d1204b4d7dc1e72f9ed
          - name: SYSTEM_UPGRADE_PLAN_LATEST_VERSION
            value: v0.9.1
          image: rancher/kubectl:v1.17.0
          imagePullPolicy: IfNotPresent
          name: drain
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /host
            name: host-root
          - mountPath: /run/system-upgrade/pod
            name: pod-info
            readOnly: true
        restartPolicy: Never
        schedulerName: default-scheduler
        securityContext: {}
        serviceAccount: k3os-upgrade
        serviceAccountName: k3os-upgrade
        terminationGracePeriodSeconds: 30
        tolerations:
        - effect: NoSchedule
          key: node.kubernetes.io/unschedulable
          operator: Exists
        volumes:
        - hostPath:
            path: /
            type: Directory
          name: host-root
        - downwardAPI:
            defaultMode: 420
            items:
            - fieldRef:
                apiVersion: v1
                fieldPath: metadata.labels
              path: labels
            - fieldRef:
                apiVersion: v1
                fieldPath: metadata.annotations
              path: annotations
          name: pod-info
  status:
    completionTime: "2020-03-11T16:41:04Z"
    conditions:
    - lastProbeTime: "2020-03-11T16:41:04Z"
      lastTransitionTime: "2020-03-11T16:41:04Z"
      status: "True"
      type: Complete
    startTime: "2020-03-11T16:38:45Z"
    succeeded: 1
- apiVersion: batch/v1
  kind: Job
  metadata:
    annotations:
      objectset.rio.cattle.io/applied: H4sIAAAAAAAA/+xXXW/bOhL9K7vzLDmynaS2gD54a3drtHWMOl2gKIKAIkc21xSpS1JOjMD//WIo2ZVT56P3Pl2gCBBL4szhzOEccvgABXommGeQPoBmBUIKrCzVNl73jYsV8+h8bHS8rjK0Gj26+E76VXzRH/LLyyEm/b7onXd7WcKG8UWP9QVEAciVjBNawHFb57GACLhF5qXR17JA51lRQqorpSJQLEPlKAqT/R+5d+g7VpoOZ94r7EhztmJuBSl0+8n5JRdvekORZFkv6w56g27eyxg/v8BenyXZRf9yKBKIoFRMd6pyaZnAFlArM0jhVCLDAXaH4s1gOEiGOOAs6+ZdwS8vkkHS4+L84nzA+xDBz9DcaG+NUmghhTrpuLGKW2OnXLURga4DzyetKKU9qU0Kp8w2aJ00ZLlJOt2kk8AuAqa18YH9Z3mW4oXYn3I0dxptvNysIT0VUjf610epxds5pfASSlOKx3m+7HK66H4OxnsVO+RGCxez3KONc6mlWyHlPkwS2O0icCVyIopxLzc4RiaU1Lio3SAdJkkEGeNrk+efZCE9pL0IPBYlBUyObW29ovB/l2urXFv8b4yqCnSQfj9sUSvjfGyNITh6njO/ChSGXziDCPy2JMuxtMi9sVuCPPiXRsRS5wYiEOZO3zErRvMpIUiPRT1Vg9WsTwS5RCW+YE5W4bme9LDKncYyzNM4tyX3CoS2+W53s9vdRCC19O+M9kxqtEckCMskKUkWbEnvlmm+QntGa8K9SjfdTvdNhzZCZpfkefA4WrU4JjYcqsATRPDvZwuFHORSG4uxYFgY7dDXMAIVeoyV4UzFoe7pa24sR7iJAPWmHf3i2+J68vn26/y/X0bjye3siv6NPk8ggg1TFb63pjgQdYI0qo4OVeGM8Hbt1X0EPb8a/xLyYTn0a5C/Tse/DFxJ8Tzup9HsKORHGnrB89PoerK4vv0wWnxoAfzlneN1s/1v8mUxvZq1Jjxo+SYCi85UlpOKH3ZRI+nPptL+SV0XNNowd0YDcFrAFpm40moLqbcVPvKzlT473tjOSiNCSEE280qpuVGSbyGFaT4zfm7RofbBhJ/UXQN0Snl949J92uReFEwLUh6NNCLJqiXJYa/JH2hxvCZVqvBIJOS1i8WQUnh2W82bj1lNU0yCW8e5VPg20BRypvnO9jomg9o7rMHbevRwPFJQzksdNp4Go23xW7u/tfvP0m4EDnllpd/SyYn3PnRgrGSZVNLLkAkwEZT5bjS/XXxb3P7n6uoabnYRlFZupMIl9YIU1a7hwDPrD9PNcBPOQod2IzmOOKfIZ62u9YeuKf8Z+jtj1/s8Q88yHbdfp/N3+1eWUzfqt+FOZgSOWu8W/6ikRTGurNTLBV+hqJTUy2k4kpvPk3vklQ8dVY2waE73a7RNd1Mwz1eT+9Kic/WN4PsDrHF71NOFa5dxPixcBKZEy6hHSGGq98USNrFWQ3Gzoz/i0YiR9vLvBP/9oW6/9uGHpvrpwJ/oP58NvKVMijsCb0qjzHL78Tku6gy9UYT7KAriu3PsV2lX58oy9YjIyb10oX/CPEdOHf7MNMQcpvGquXSM6Kry/nBTGSYJtcme+Spoc/dnAAAA//84zTIXzw8AAA
      objectset.rio.cattle.io/id: system-upgrade-controller
      objectset.rio.cattle.io/owner-gvk: upgrade.cattle.io/v1, Kind=Plan
      objectset.rio.cattle.io/owner-name: k3os-latest
      objectset.rio.cattle.io/owner-namespace: k3os-system
      upgrade.cattle.io/ttl-seconds-after-finished: "900"
    creationTimestamp: "2020-04-11T15:55:03Z"
    labels:
      objectset.rio.cattle.io/hash: 13046cd729d0bb2b18281f2bac45e23a0b5369d0
      plan.upgrade.cattle.io/k3os-latest: 539c669e033d2412b0a998e19d789809e8cab1f1dc650802cd4548c3
      upgrade.cattle.io/controller: system-upgrade-controller
      upgrade.cattle.io/node: kubernetes
      upgrade.cattle.io/plan: k3os-latest
      upgrade.cattle.io/version: v0.10.0
    name: apply-k3os-latest-on-kubernetes-with-539c669e033d2412b0a9-52a3d
    namespace: k3os-system
    resourceVersion: "13896938"
    selfLink: /apis/batch/v1/namespaces/k3os-system/jobs/apply-k3os-latest-on-kubernetes-with-539c669e033d2412b0a9-52a3d
    uid: 324c11c9-cdd7-479b-9da0-7280252dc610
  spec:
    activeDeadlineSeconds: 900
    backoffLimit: 2
    completions: 1
    parallelism: 1
    selector:
      matchLabels:
        controller-uid: 324c11c9-cdd7-479b-9da0-7280252dc610
    template:
      metadata:
        creationTimestamp: null
        labels:
          controller-uid: 324c11c9-cdd7-479b-9da0-7280252dc610
          job-name: apply-k3os-latest-on-kubernetes-with-539c669e033d2412b0a9-52a3d
          plan.upgrade.cattle.io/k3os-latest: 539c669e033d2412b0a998e19d789809e8cab1f1dc650802cd4548c3
          upgrade.cattle.io/controller: system-upgrade-controller
          upgrade.cattle.io/node: kubernetes
          upgrade.cattle.io/plan: k3os-latest
          upgrade.cattle.io/version: v0.10.0
      spec:
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: kubernetes.io/hostname
                  operator: In
                  values:
                  - kubernetes
          podAntiAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                - key: upgrade.cattle.io/plan
                  operator: In
                  values:
                  - k3os-latest
              topologyKey: kubernetes.io/hostname
        containers:
        - args:
          - upgrade
          - --kernel
          - --rootfs
          - --remount
          - --sync
          - --reboot
          - --lock-file=/host/run/k3os/upgrade.lock
          - --source=/k3os/system
          - --destination=/host/k3os/system
          command:
          - k3os
          - --debug
          env:
          - name: SYSTEM_UPGRADE_NODE_NAME
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: spec.nodeName
          - name: SYSTEM_UPGRADE_POD_NAME
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.name
          - name: SYSTEM_UPGRADE_POD_UID
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.uid
          - name: SYSTEM_UPGRADE_PLAN_NAME
            value: k3os-latest
          - name: SYSTEM_UPGRADE_PLAN_LATEST_HASH
            value: 539c669e033d2412b0a998e19d789809e8cab1f1dc650802cd4548c3
          - name: SYSTEM_UPGRADE_PLAN_LATEST_VERSION
            value: v0.10.0
          image: rancher/k3os:v0.10.0
          imagePullPolicy: IfNotPresent
          name: upgrade
          resources: {}
          securityContext:
            capabilities:
              add:
              - CAP_SYS_BOOT
            privileged: true
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /host
            name: host-root
          - mountPath: /run/system-upgrade/pod
            name: pod-info
            readOnly: true
        dnsPolicy: ClusterFirst
        hostIPC: true
        hostNetwork: true
        hostPID: true
        initContainers:
        - args:
          - drain
          - kubernetes
          - --pod-selector
          - '!upgrade.cattle.io/controller'
          - --ignore-daemonsets
          - --delete-local-data
          - --force
          env:
          - name: SYSTEM_UPGRADE_NODE_NAME
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: spec.nodeName
          - name: SYSTEM_UPGRADE_POD_NAME
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.name
          - name: SYSTEM_UPGRADE_POD_UID
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.uid
          - name: SYSTEM_UPGRADE_PLAN_NAME
            value: k3os-latest
          - name: SYSTEM_UPGRADE_PLAN_LATEST_HASH
            value: 539c669e033d2412b0a998e19d789809e8cab1f1dc650802cd4548c3
          - name: SYSTEM_UPGRADE_PLAN_LATEST_VERSION
            value: v0.10.0
          image: rancher/kubectl:v1.17.0
          imagePullPolicy: IfNotPresent
          name: drain
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /host
            name: host-root
          - mountPath: /run/system-upgrade/pod
            name: pod-info
            readOnly: true
        restartPolicy: Never
        schedulerName: default-scheduler
        securityContext: {}
        serviceAccount: k3os-upgrade
        serviceAccountName: k3os-upgrade
        terminationGracePeriodSeconds: 30
        tolerations:
        - effect: NoSchedule
          key: node.kubernetes.io/unschedulable
          operator: Exists
        volumes:
        - hostPath:
            path: /
            type: Directory
          name: host-root
        - downwardAPI:
            defaultMode: 420
            items:
            - fieldRef:
                apiVersion: v1
                fieldPath: metadata.labels
              path: labels
            - fieldRef:
                apiVersion: v1
                fieldPath: metadata.annotations
              path: annotations
          name: pod-info
  status:
    completionTime: "2020-04-11T15:56:10Z"
    conditions:
    - lastProbeTime: "2020-04-11T15:56:10Z"
      lastTransitionTime: "2020-04-11T15:56:10Z"
      status: "True"
      type: Complete
    startTime: "2020-04-11T15:55:03Z"
    succeeded: 1
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
kubectl get -o yaml -n k3os-system pod -l 'job-name' # plan pod           
                                                       
apiVersion: v1
items:
- apiVersion: v1
  kind: Pod
  metadata:
    creationTimestamp: "2020-04-11T15:55:03Z"
    generateName: apply-k3os-latest-on-kubernetes-with-539c669e033d2412b0a9-52a3d-
    labels:
      controller-uid: 324c11c9-cdd7-479b-9da0-7280252dc610
      job-name: apply-k3os-latest-on-kubernetes-with-539c669e033d2412b0a9-52a3d
      plan.upgrade.cattle.io/k3os-latest: 539c669e033d2412b0a998e19d789809e8cab1f1dc650802cd4548c3
      upgrade.cattle.io/controller: system-upgrade-controller
      upgrade.cattle.io/node: kubernetes
      upgrade.cattle.io/plan: k3os-latest
      upgrade.cattle.io/version: v0.10.0
    name: apply-k3os-latest-on-kubernetes-with-539c669e033d2412b0a9-5l8rf
    namespace: k3os-system
    ownerReferences:
    - apiVersion: batch/v1
      blockOwnerDeletion: true
      controller: true
      kind: Job
      name: apply-k3os-latest-on-kubernetes-with-539c669e033d2412b0a9-52a3d
      uid: 324c11c9-cdd7-479b-9da0-7280252dc610
    resourceVersion: "13896937"
    selfLink: /api/v1/namespaces/k3os-system/pods/apply-k3os-latest-on-kubernetes-with-539c669e033d2412b0a9-5l8rf
    uid: f2d07e54-0e42-4013-ad7b-f6bea2857c21
  spec:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
              - kubernetes
      podAntiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
            - key: upgrade.cattle.io/plan
              operator: In
              values:
              - k3os-latest
          topologyKey: kubernetes.io/hostname
    containers:
    - args:
      - upgrade
      - --kernel
      - --rootfs
      - --remount
      - --sync
      - --reboot
      - --lock-file=/host/run/k3os/upgrade.lock
      - --source=/k3os/system
      - --destination=/host/k3os/system
      command:
      - k3os
      - --debug
      env:
      - name: SYSTEM_UPGRADE_NODE_NAME
        valueFrom:
          fieldRef:
            apiVersion: v1
            fieldPath: spec.nodeName
      - name: SYSTEM_UPGRADE_POD_NAME
        valueFrom:
          fieldRef:
            apiVersion: v1
            fieldPath: metadata.name
      - name: SYSTEM_UPGRADE_POD_UID
        valueFrom:
          fieldRef:
            apiVersion: v1
            fieldPath: metadata.uid
      - name: SYSTEM_UPGRADE_PLAN_NAME
        value: k3os-latest
      - name: SYSTEM_UPGRADE_PLAN_LATEST_HASH
        value: 539c669e033d2412b0a998e19d789809e8cab1f1dc650802cd4548c3
      - name: SYSTEM_UPGRADE_PLAN_LATEST_VERSION
        value: v0.10.0
      image: rancher/k3os:v0.10.0
      imagePullPolicy: IfNotPresent
      name: upgrade
      resources: {}
      securityContext:
        capabilities:
          add:
          - CAP_SYS_BOOT
        privileged: true
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
      - mountPath: /host
        name: host-root
      - mountPath: /run/system-upgrade/pod
        name: pod-info
        readOnly: true
      - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
        name: k3os-upgrade-token-lbqfc
        readOnly: true
    dnsPolicy: ClusterFirst
    enableServiceLinks: true
    hostIPC: true
    hostNetwork: true
    hostPID: true
    initContainers:
    - args:
      - drain
      - kubernetes
      - --pod-selector
      - '!upgrade.cattle.io/controller'
      - --ignore-daemonsets
      - --delete-local-data
      - --force
      env:
      - name: SYSTEM_UPGRADE_NODE_NAME
        valueFrom:
          fieldRef:
            apiVersion: v1
            fieldPath: spec.nodeName
      - name: SYSTEM_UPGRADE_POD_NAME
        valueFrom:
          fieldRef:
            apiVersion: v1
            fieldPath: metadata.name
      - name: SYSTEM_UPGRADE_POD_UID
        valueFrom:
          fieldRef:
            apiVersion: v1
            fieldPath: metadata.uid
      - name: SYSTEM_UPGRADE_PLAN_NAME
        value: k3os-latest
      - name: SYSTEM_UPGRADE_PLAN_LATEST_HASH
        value: 539c669e033d2412b0a998e19d789809e8cab1f1dc650802cd4548c3
      - name: SYSTEM_UPGRADE_PLAN_LATEST_VERSION
        value: v0.10.0
      image: rancher/kubectl:v1.17.0
      imagePullPolicy: IfNotPresent
      name: drain
      resources: {}
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
      - mountPath: /host
        name: host-root
      - mountPath: /run/system-upgrade/pod
        name: pod-info
        readOnly: true
      - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
        name: k3os-upgrade-token-lbqfc
        readOnly: true
    nodeName: kubernetes
    priority: 0
    restartPolicy: Never
    schedulerName: default-scheduler
    securityContext: {}
    serviceAccount: k3os-upgrade
    serviceAccountName: k3os-upgrade
    terminationGracePeriodSeconds: 30
    tolerations:
    - effect: NoSchedule
      key: node.kubernetes.io/unschedulable
      operator: Exists
    - effect: NoExecute
      key: node.kubernetes.io/not-ready
      operator: Exists
      tolerationSeconds: 300
    - effect: NoExecute
      key: node.kubernetes.io/unreachable
      operator: Exists
      tolerationSeconds: 300
    volumes:
    - hostPath:
        path: /
        type: Directory
      name: host-root
    - downwardAPI:
        defaultMode: 420
        items:
        - fieldRef:
            apiVersion: v1
            fieldPath: metadata.labels
          path: labels
        - fieldRef:
            apiVersion: v1
            fieldPath: metadata.annotations
          path: annotations
      name: pod-info
    - name: k3os-upgrade-token-lbqfc
      secret:
        defaultMode: 420
        secretName: k3os-upgrade-token-lbqfc
  status:
    conditions:
    - lastProbeTime: null
      lastTransitionTime: "2020-04-11T15:56:09Z"
      reason: PodCompleted
      status: "True"
      type: Initialized
    - lastProbeTime: null
      lastTransitionTime: "2020-04-11T15:55:03Z"
      reason: PodCompleted
      status: "False"
      type: Ready
    - lastProbeTime: null
      lastTransitionTime: "2020-04-11T15:55:03Z"
      reason: PodCompleted
      status: "False"
      type: ContainersReady
    - lastProbeTime: null
      lastTransitionTime: "2020-04-11T15:55:03Z"
      status: "True"
      type: PodScheduled
    containerStatuses:
    - containerID: containerd://ece8a3da91eb60c65af5058ef9902125b0a522bb704ace225445778d7c066702
      image: docker.io/rancher/k3os:v0.10.0
      imageID: docker.io/rancher/k3os@sha256:ed256bd8e5d127e37afd4e709b64d020fa12220107e3168957c3c8ba8637eaf9
      lastState: {}
      name: upgrade
      ready: false
      restartCount: 0
      started: false
      state:
        terminated:
          containerID: containerd://ece8a3da91eb60c65af5058ef9902125b0a522bb704ace225445778d7c066702
          exitCode: 0
          finishedAt: "2020-04-11T15:56:09Z"
          reason: Completed
          startedAt: "2020-04-11T15:56:09Z"
    hostIP: 192.168.1.51
    initContainerStatuses:
    - containerID: containerd://73c3ff95a8d774071c205de14ffaa075e382aac111118702b20ae2db5b749e86
      image: docker.io/rancher/kubectl:v1.17.0
      imageID: docker.io/rancher/kubectl@sha256:8a93d3b0386bf17eacea25dcd30356ebfea93ea7adfcb8c7ac6552654b4a2b4f
      lastState: {}
      name: drain
      ready: true
      restartCount: 0
      state:
        terminated:
          containerID: containerd://73c3ff95a8d774071c205de14ffaa075e382aac111118702b20ae2db5b749e86
          exitCode: 0
          finishedAt: "2020-04-11T15:56:09Z"
          reason: Completed
          startedAt: "2020-04-11T15:55:03Z"
    phase: Succeeded
    podIP: 192.168.1.51
    podIPs:
    - ip: 192.168.1.51
    qosClass: BestEffort
    startTime: "2020-04-11T15:55:03Z"
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
kubectl get node -o json | jq -r .items[].metadata.labels # node labels                                                          
{
  "beta.kubernetes.io/arch": "amd64",
  "beta.kubernetes.io/instance-type": "k3s",
  "beta.kubernetes.io/os": "linux",
  "k3os.io/mode": "local",
  "k3os.io/version": "v0.10.0",
  "k3s.io/hostname": "kubernetes",
  "k3s.io/internal-ip": "192.168.1.51",
  "kubernetes.io/arch": "amd64",
  "kubernetes.io/hostname": "kubernetes",
  "kubernetes.io/os": "linux",
  "node-role.kubernetes.io/master": "true",
  "node.kubernetes.io/instance-type": "k3s",
  "plan.upgrade.cattle.io/k3os-latest": "539c669e033d2412b0a998e19d789809e8cab1f1dc650802cd4548c3"
}

Just had a look at that label you're mentioning. It says:
plan.upgrade.cattle.io/k3os-latest: 539c669e033d2412b0a998e19d789809e8cab1f1dc650802cd4548c3

Does this match the output of kubectl get -n k3os-system plan -o json | jq -r .items[].status.latestHash?

Yes, it does:

kubectl get -n k3os-system plan -o json | jq -r .items[].status.latestHash                                                       
539c669e033d2412b0a998e19d789809e8cab1f1dc650802cd4548c3

Let me know if you want me to rescale the SUC replicaset back to 1. And if so, if you need additional logging.

@dweomer
Copy link
Contributor

dweomer commented Apr 13, 2020

@kooskaspers I think the old upgrade jobs (from previous versions of the SUC) are confusing the current version/deployment of the SUC. Please delete all existing jobs in k3os-system namespace. I will attempt to replicate such a setup (1 or 2 dangling jobs from old upgrades) and let you know if it should be safe to scale up your SUC replicaset.

@andyschmid
Copy link

Just to chime in, I am also seeing this issue on my k3os cluster. I've scaled down the SUC replicaset to 0 and removed all upgrade jobs from the system.

@dweomer
Copy link
Contributor

dweomer commented Apr 13, 2020

@kooskaspers and @andyschmid I appreciate your patience and willingness to work with me tracking this one down. I was able to replicate the flapping. It happens because of an unforseen interaction between the latest SUC and legacy SUC upgrade jobs. After upgrading to v0.10.0 and making sure the node is labeled accordingly, e.g. plan.upgrade.cattle.io/${plan.name}=${plan.status.latestHash} (it will sometimes pick up a label from one of the legacy jobs and run the v0.10.0 apply again which does nothing other than the drain) you can safely delete all jobs in the k3os-system namespace:

kubectl delete job -n k3os-system --all

This is a work-around until I can spend some time with the SUC code (likely later this week). I will submit an issue there pointing to this one.

@kooskaspers
Copy link
Contributor Author

kooskaspers commented Apr 18, 2020

@dweomer thanks for having a detailed look at this issue! Made some time this morning to test out your workaround (was a bit busy last couple of days).

Deleted all jobs:

kubectl delete job -n k3os-system --all

All jobs are gone now:

kubectl get job -n k3os-system                                                                                    
No resources found in k3os-system namespace.

scaled up the replicas:

kubectl scale deployments system-upgrade-controller --replicas=1

controller starts running again:

kubectl get all                                                                                                   
NAME                                             READY   STATUS    RESTARTS   AGE
pod/system-upgrade-controller-69c695c7fc-qrrnz   1/1     Running   0          45s

NAME                                        READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/system-upgrade-controller   1/1     1            1           58d

NAME                                                   DESIRED   CURRENT   READY   AGE
replicaset.apps/system-upgrade-controller-69c695c7fc   1         1         1       6d14h
kubectl logs -f system-upgrade-controller-69c695c7fc-qrrnz                                                        
W0418 06:24:37.760985       1 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2020-04-18T06:24:37Z" level=info msg="Updating CRD plans.upgrade.cattle.io"
time="2020-04-18T06:24:38Z" level=info msg="Starting /v1, Kind=Node controller"
time="2020-04-18T06:24:38Z" level=info msg="Starting /v1, Kind=Secret controller"
time="2020-04-18T06:24:38Z" level=info msg="Starting batch/v1, Kind=Job controller"
time="2020-04-18T06:24:38Z" level=info msg="Starting upgrade.cattle.io/v1, Kind=Plan controller"

And after a while, I still see no pod 'apply-k3os-latest-[...]' being scheduled anymore:

kubectl get job -n k3os-system                                                                                    
No resources found in k3os-system namespace.

Looks like we tamed the upgrade controller!

Question: I don't want the controller to upgrade my kubernetes node overnight. I want to be fully aware of an upgrade taking place, so when issues arise, I know where to have a look. FYI: I'm having lightning controllers, ventilation controlling, heating, dns, websites and such running on a single k8s node, the wife is not amused when everything is down. So what's the best practice to disable the controller, enabling it whenever I want? Scaling the pods? Or using the label trick? (set it to 'disabled'). Or hardcode the current version (like 'v0.10.0') in the spec section of the upgrade plan? Just curious what would be your strategy.

@dweomer
Copy link
Contributor

dweomer commented Apr 18, 2020

@kooskaspers wrote:

Question: I don't want the controller to upgrade my kubernetes node overnight. I want to be fully aware of an upgrade taking place, so when issues arise, I know where to have a look. FYI: I'm having lightning controllers, ventilation controlling, heating, dns, websites and such running on a single k8s node, the wife is not amused when everything is down. So what's the best practice to disable the controller, enabling it whenever I want? Scaling the pods? Or using the label trick? (set it to 'disabled'). Or hardcode the current version (like 'v0.10.0') in the spec section of the upgrade plan? Just curious what would be your strategy.

The most reliable way to prevent the SUC from applying a plan to a node is by making sure that the node does not meet the selection criteria of the plan. This means removing the plan.upgrade.cattle.io/k3os-latest label from the node or at least setting the value for it to disabled. This works so long as there are no completed apply jobs in the k3os-system namespace that are missing an annotation introduced with SUC v0.4.0 (which will trigger the bug described in rancher/system-upgrade-controller#58). Because you've deleted the old jobs, this will just work.

@dweomer
Copy link
Contributor

dweomer commented Apr 18, 2020

I hope to fix rancher/system-upgrade-controller#58 next week and push out a bugfix release for k3OS that includes it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants