Kubevirt: Defer eve reboot/shutdown/update until drain completes #4494

andrewd-zededa · 2024-12-23T23:08:37Z

As a part of kubevirt-eve we have multiple cluster nodes each
hosting app workloads and volume replicas. This implements defer
for eve mgmt config operations which will result in unavailability of storage
replicas. An example:

Node 1 outage and recovers.
Before volumes complete rebuilding on node 1 there is a node 2 outage and recovery.
Volumes begin rebuilding replicas on nodes 1 and 2. Only available rebuild source is on node 3.
User initiated request to reboot/shutdown/update eve-os on node 3.
That config request is set to defer until replicas are rebuilt on the other nodes.

As a part of node drain all new node workloads are gated on that node with a cordon. The removal of that gate is managed by zedkube nodeOnBootHealthStatusWatcher() which waits until the local kubernetes node comes online/ready for the first time on each boot event and uncordons it.

For eve baseos image updates: this path waits until a new baseos image is available locally (LOADED or INSTALLED) and activated before beginning drain.

andrewd-zededa · 2024-12-23T23:12:30Z

  ./out has been created
  Modes: GitHubActions Robot InContainer ResetRepo UnitTests
  Processing: GH:4494
  GITHUB PR #4494 is being downloaded from
  https://api.github.com/repos/lf-edge/eve/pulls/4494
    JSON data at Mon Dec 23 11:09:31 PM UTC 2024
    Patch data at Mon Dec 23 11:09:32 PM UTC 2024
  ERROR: Unsure how to process GH:4494. Permissions missing?

An odd yetus failure, not sure what to make of this.

As a part of kubevirt-eve we have multiple cluster nodes each hosting app workloads and volume replicas. This implements defer for eve mgmt operations which will result in unavailability of storage replicas. An example: 1. Node 1 outage and recovers. 2. Before volumes complete rebuilding on node 1: Node 2 outage and recovery. 3. Volumes begin rebuilding replicas on nodes 1 and 2. 4. User initiated request to reboot/shutdown/update eve-os on node3. 5. That config request is set to defer until replicas are rebuilt on the other nodes. Signed-off-by: Andrew Durbin <andrewd@zededa.com>

tidy, vendor, ... Signed-off-by: Andrew Durbin <andrewd@zededa.com>

github-actions bot requested review from deitch, eriknordmark, milan-zededa, OhmSpectator, rouming and uncleDecart December 23, 2024 23:14

andrewd-zededa force-pushed the drain-node branch from 3c4957b to 9caa356 Compare December 23, 2024 23:18

andrewd-zededa force-pushed the drain-node branch 2 times, most recently from c65f122 to 465e6a0 Compare January 2, 2025 15:41

andrewd-zededa added 2 commits January 2, 2025 12:38

Updated go mod

e3d900e

tidy, vendor, ... Signed-off-by: Andrew Durbin <andrewd@zededa.com>

andrewd-zededa force-pushed the drain-node branch from 465e6a0 to e3d900e Compare January 2, 2025 19:39

andrewd-zededa mentioned this pull request Jan 3, 2025

Publish ZInfoKubeCluster and ZInfoKubeClusterUpdateStatus #4507

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubevirt: Defer eve reboot/shutdown/update until drain completes #4494

Kubevirt: Defer eve reboot/shutdown/update until drain completes #4494

andrewd-zededa commented Dec 23, 2024 •

edited

Loading

andrewd-zededa commented Dec 23, 2024

Kubevirt: Defer eve reboot/shutdown/update until drain completes #4494

Are you sure you want to change the base?

Kubevirt: Defer eve reboot/shutdown/update until drain completes #4494

Conversation

andrewd-zededa commented Dec 23, 2024 • edited Loading

andrewd-zededa commented Dec 23, 2024

andrewd-zededa commented Dec 23, 2024 •

edited

Loading