This repository has been archived by the owner on Jun 29, 2022. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Use a Deployment for kube-apiserver
First of all this isn't such a trivial and potential controversial change it is one way to tackle the issue, this is my proposal
Current situation:
When updating the kube-apiserver deployment on bare-metal (potentially others too) there is a bug where Helm returns an error as it's API Server endpoint goes down. This is as Lokomotive uses one of the controller nodes without a proxy in front of it.
This however is only an issue when there are multiple controller nodes, when there are Lokomotive switches from a Deployment to a DaemonSet. DaemonSets however do not support having 2 pods running on the same machine to do a 0-surge rollout of a new update. Deployments do, and thanks to SO_REUSEADDR it works nicely allowing the API server to stay up on it's own rollout.
This issue came back around as it would also break the certificate rotation logic on bare metal platforms.
Ideas I had to fix this:
Commit description:
Before a DaemonSet was used to deploy multiple kube-apiservers these
were bound to the hostport 6443, because DaemonSet do not support a
rollover without 1 pos bing unavailable this caused issues with one API
server endpoint becoming unavailable on update.
In systems where no network level loadbalancer of these was implemented
it causes Helm to error as it can no longer check how the rollout of
its update is going causing the kube-apiserver to never be updated.
This changes the multi controller setup to use a Deployment just like a
single controller setup. It uses Pod AntiAffinity rules to spread to all
controller nodes.
How to use
Deploy multi controller clusters
Testing done
TODO: upgrade tests, waiting for opions before spending time on this