This repository was archived by the owner on Sep 18, 2020. It is now read-only.
Description I have CLUO running on my K8s cluster with CoreOS on both controller and worker nodes.
The reboots triggered on the controllers successfully complete, but the reboots on worker nodes hang indefinitely. For example:
I0406 16:33:58.958962 1 agent.go:84] Setting info labels
I0406 16:33:58.988317 1 agent.go:98] Setting annotations map[string]string{"container-linux-update.v1.coreos.com/reboot-in-progress":"false", "container-linux-update.v1.coreos.com/reboot-needed":"false"}
I0406 16:34:46.834224 1 agent.go:110] Marking node as schedulable
I0406 16:34:46.858979 1 agent.go:120] Waiting for ok-to-reboot from controller...
I0406 16:34:46.859150 1 agent.go:246] Beginning to watch update_engine status
I0406 16:34:46.860756 1 agent.go:198] Updating status
I0406 16:34:46.860780 1 agent.go:210] Indicating a reboot is needed
I0406 16:35:51.649523 1 agent.go:134] Setting annotations map[string]string{"container-linux-update.v1.coreos.com/reboot-in-progress":"true"}
I0406 16:35:51.682890 1 agent.go:146] Marking node as unschedulable
I0406 16:35:51.701994 1 agent.go:151] Getting pod list for deletion
I0406 16:35:51.761232 1 agent.go:160] Deleting 4 pods
. . .
<all pods deleted>
. . .
I0406 16:36:32.164977 1 agent.go:184] Node drained, rebooting
Once this completes, the node is cordoned and should reboot, but the reboot itself never occurs.
Where should I check first to help debug this?
Thanks!