Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Use a Deployment for kube-apiserver #1455

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

meyskens
Copy link
Contributor

Use a Deployment for kube-apiserver

First of all this isn't such a trivial and potential controversial change it is one way to tackle the issue, this is my proposal

Current situation:
When updating the kube-apiserver deployment on bare-metal (potentially others too) there is a bug where Helm returns an error as it's API Server endpoint goes down. This is as Lokomotive uses one of the controller nodes without a proxy in front of it.
This however is only an issue when there are multiple controller nodes, when there are Lokomotive switches from a Deployment to a DaemonSet. DaemonSets however do not support having 2 pods running on the same machine to do a 0-surge rollout of a new update. Deployments do, and thanks to SO_REUSEADDR it works nicely allowing the API server to stay up on it's own rollout.
This issue came back around as it would also break the certificate rotation logic on bare metal platforms.

Ideas I had to fix this:

  • Implement a retry loop: tricky with timing, and Helm itself will get into trouble when it sees a failed rollout
  • Set Helm to Atomic=false (we probably don't want that)
  • Add a loadbalancer in front of it (fun fact: this was one of the Typhoon modifications we did at source{d})
  • This one: make a Deployment with endorced AntiAffinity rules to replicate a DS with all rollout features of a Deployment

Commit description:
Before a DaemonSet was used to deploy multiple kube-apiservers these
were bound to the hostport 6443, because DaemonSet do not support a
rollover without 1 pos bing unavailable this caused issues with one API
server endpoint becoming unavailable on update.
In systems where no network level loadbalancer of these was implemented
it causes Helm to error as it can no longer check how the rollout of
its update is going causing the kube-apiserver to never be updated.

This changes the multi controller setup to use a Deployment just like a
single controller setup. It uses Pod AntiAffinity rules to spread to all
controller nodes.

How to use

Deploy multi controller clusters

Testing done

TODO: upgrade tests, waiting for opions before spending time on this

Before a DaemonSet was used to deploy multiple kube-apiservers these
were bound to the hostport 6443, because DaemonSet do not support a
rollover without 1 pos bing unavailable this caused issues with one API
server endpoint becoming unavailable on update.
In systems where no network level loadbalancer of these was implemented
it causes Helm to error as it can no longer check how the rollout of
its update is going causing the kube-apiserver to never be updated.

This changes the multi controller setup to use a Deployment just like a
single controller setup. It uses Pod AntiAffinity rules to spread to all
 controller nodes.

Signed-off-by: Maartje Eyskens <maartje@kinvolk.io>
@meyskens meyskens force-pushed the meyskens/apiserver-no-down-update branch from 6f50021 to 3053f64 Compare March 29, 2021 13:48
Copy link
Member

@ipochi ipochi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there are any side-effects/cons that you think of moving from DaemonSet to Deployment.

If the constraint is not satisfied, I believe the default is to not schedule the pod, Is this something that can result in a broken cluster ?

operator: In
values:
- {{ .Release.Revision | quote }}
topologyKey: kubernetes.io/hostname
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For topologyKey, wouldn't it be more correct to use either of the label

node.kubernetes.io/master
node.kubernetes.io/controller=true

Copy link
Member

@ipochi ipochi May 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this on the Kubernetes Docs

For requiredDuringSchedulingIgnoredDuringExecution pod anti-affinity, the admission controller
LimitPodHardAntiAffinityTopology was introduced to limit topologyKey to kubernetes.io/hostname. If you want to make it
available for custom topologies, you may modify the admission controller, or disable it.

https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#an-example-of-a-pod-that-uses-pod-affinity

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants