Upgrades to 123.0.0 can fail after unneeded kubelet restart #3827

gdemonet · 2022-07-26T09:17:05Z

Component: salt

What happened:

On a 3-nodes upgrade, where multiple registry "replicas" were configured, but only the bootstrap node (192.168.1.100 in this example) has the 123.0.0 archive, the rolling update of kube-apiserver fails on node-2 (192.168.1.102) with:

salt.exceptions.CommandExecutionError: Check availability of package container-selinux failed:
[...]
http://192.168.1.102:8080/metalk8s-123.0.0/redhat/7/metalk8s-epel-el7/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found
[...]
http://192.168.1.101:8080/metalk8s-123.0.0/redhat/7/metalk8s-epel-el7/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found

Analysis:

The issue is caused by two main problems:

when running metalk8s.orchestrate.apiserver to perform the rolling upgrade, kubelet restarts because of incomplete logic in cri.wait_pod (fixed in salt: Handle duplicates in cri.wait_pod #3828), which ends up marking the repositories-bootstrap Pod as not ready, hence removed it from the endpoints, before running the upgrade on node-2 - at this point, node-2 sees no mirror with the 123.0.0 version of metalk8s-epel, which causes the failure
there should not be a situation where this registry "HA setup" is in an incoherent state prior to running an upgrade - we need to implement something for managing the replicas properly

The text was updated successfully, but these errors were encountered:

gdemonet added kind:bug Something isn't working topic:lifecycle Issues related to upgrade or downgrade of MetalK8s topic:salt Everything related to SaltStack in our product labels Jul 26, 2022

gdemonet mentioned this issue Jul 26, 2022

salt: Handle duplicates in cri.wait_pod #3828

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrades to 123.0.0 can fail after unneeded kubelet restart #3827

Upgrades to 123.0.0 can fail after unneeded kubelet restart #3827

gdemonet commented Jul 26, 2022 •

edited

Loading

Upgrades to 123.0.0 can fail after unneeded kubelet restart #3827

Upgrades to 123.0.0 can fail after unneeded kubelet restart #3827

Comments

gdemonet commented Jul 26, 2022 • edited Loading

gdemonet commented Jul 26, 2022 •

edited

Loading