Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubevirt: Defer eve reboot/shutdown/update until drain completes #4494

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

andrewd-zededa
Copy link
Contributor

@andrewd-zededa andrewd-zededa commented Dec 23, 2024

As a part of kubevirt-eve we have multiple cluster nodes each
hosting app workloads and volume replicas. This implements defer
for eve mgmt config operations which will result in unavailability of storage
replicas. An example:

  1. Node 1 outage and recovers.
  2. Before volumes complete rebuilding on node 1 there is a node 2 outage and recovery.
  3. Volumes begin rebuilding replicas on nodes 1 and 2. Only available rebuild source is on node 3.
  4. User initiated request to reboot/shutdown/update eve-os on node 3.
  5. That config request is set to defer until replicas are rebuilt on the other nodes.

As a part of node drain all new node workloads are gated on that node with a cordon. The removal of that gate is managed by zedkube nodeOnBootHealthStatusWatcher() which waits until the local kubernetes node comes online/ready for the first time on each boot event and uncordons it.

For eve baseos image updates: this path waits until a new baseos image is available locally (LOADED or INSTALLED) and activated before beginning drain.

@andrewd-zededa
Copy link
Contributor Author

  ./out has been created
  Modes: GitHubActions Robot InContainer ResetRepo UnitTests
  Processing: GH:4494
  GITHUB PR #4494 is being downloaded from
  https://api.github.com/repos/lf-edge/eve/pulls/4494
    JSON data at Mon Dec 23 11:09:31 PM UTC 2024
    Patch data at Mon Dec 23 11:09:32 PM UTC 2024
  ERROR: Unsure how to process GH:4494. Permissions missing?

An odd yetus failure, not sure what to make of this.

As a part of kubevirt-eve we have multiple cluster nodes each
hosting app workloads and volume replicas.  This implements defer
for eve mgmt operations which will result in unavailability of storage
replicas.  An example:

1. Node 1 outage and recovers.
2. Before volumes complete rebuilding on node 1: Node 2 outage and recovery.
3. Volumes begin rebuilding replicas on nodes 1 and 2.
4. User initiated request to reboot/shutdown/update eve-os on node3.
5. That config request is set to defer until replicas are rebuilt on the other nodes.

Signed-off-by: Andrew Durbin <andrewd@zededa.com>
tidy, vendor, ...

Signed-off-by: Andrew Durbin <andrewd@zededa.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant