Upgrades to 123.0.0 can fail after unneeded kubelet restart #3827
Labels
kind:bug
Something isn't working
topic:lifecycle
Issues related to upgrade or downgrade of MetalK8s
topic:salt
Everything related to SaltStack in our product
Component: salt
What happened:
On a 3-nodes upgrade, where multiple registry "replicas" were configured, but only the bootstrap node (192.168.1.100 in this example) has the 123.0.0 archive, the rolling update of kube-apiserver fails on node-2 (192.168.1.102) with:
Analysis:
The issue is caused by two main problems:
metalk8s.orchestrate.apiserver
to perform the rolling upgrade,kubelet
restarts because of incomplete logic incri.wait_pod
(fixed in salt: Handle duplicates incri.wait_pod
#3828), which ends up marking the repositories-bootstrap Pod as not ready, hence removed it from the endpoints, before running the upgrade on node-2 - at this point, node-2 sees no mirror with the 123.0.0 version of metalk8s-epel, which causes the failureThe text was updated successfully, but these errors were encountered: