Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop PodDisruptionBudget for calico-kube-controllers #183

Merged
merged 1 commit into from
Jul 15, 2022

Conversation

timebertt
Copy link
Member

How to categorize this PR?

/area networking
/kind bug

What this PR does / why we need it:

Drop PodDisruptionBudget for calico-kube-controllers, because

A few references for context:

Which issue(s) this PR fixes:

calico-kube-controllers was in CrashLoopBackOff because of OOM kills:

$ ks get po -l k8s-app=calico-kube-controllers
NAME                                       READY   STATUS             RESTARTS   AGE
calico-kube-controllers-7d557d76c6-dqq8k   0/1     CrashLoopBackOff   59         4h43m

Hence, the PDB had status.disruptionsAllowed=0:

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: calico-kube-controllers
  namespace: kube-system
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      k8s-app: calico-kube-controllers
status:
  currentHealthy: 0
  desiredHealthy: 0
  disruptionsAllowed: 0
  expectedPods: 1
  observedGeneration: 1

Which caused kube-apiserver to deny the eviction by VPA:

I0519 01:18:03.188662       1 update_priority_calculator.go:114] quick OOM detected in pod kube-system/calico-kube-controllers-7d557d76c6-g7lv2, container calico-kube-controllers
I0519 01:18:03.188675       1 update_priority_calculator.go:143] pod accepted for update kube-system/calico-kube-controllers-7d557d76c6-g7lv2 with priority 0.37876744005850843
I0519 01:18:03.188692       1 updater.go:215] evicting pod calico-kube-controllers-7d557d76c6-g7lv2
E0519 01:18:03.191952       1 pods_eviction_restriction.go:141] failed to evict pod kube-system/calico-kube-controllers-7d557d76c6-g7lv2, error: Cannot evict pod as it would violate the pod's disruption budget.
W0519 01:18:03.191970       1 updater.go:218] evicting pod calico-kube-controllers-7d557d76c6-g7lv2 failed: Cannot evict pod as it would violate the pod's disruption budget.

The issue is actually a more general one. However, removing PDBs for singleton pods is the best thing we can do for now to ensure VPA keeps scaling up components on CrashLoopBackOff because of OOM kills.

Special notes for your reviewer:

Release note:

The `PodDisruptionBudget` for `calico-kube-controllers` is removed, as it is a singleton and can prevent VPA from scaling it up.

@timebertt timebertt requested review from a team as code owners May 19, 2022 07:52
@gardener-robot gardener-robot added area/networking Networking related kind/bug Bug needs/review Needs review size/xs Size of pull request is tiny (see gardener-robot robot/bots/size.py) labels May 19, 2022
@gardener-robot-ci-2 gardener-robot-ci-2 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels May 19, 2022
Copy link
Member

@DockToFuture DockToFuture left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-robot gardener-robot added reviewed/lgtm Has approval for merging and removed needs/review Needs review labels May 19, 2022
@timebertt
Copy link
Member Author

@DockToFuture @ScheererJ should we merge this PR?

@timebertt
Copy link
Member Author

ping @DockToFuture @ScheererJ

Copy link
Member

@ScheererJ ScheererJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@ScheererJ ScheererJ merged commit 6b404b6 into gardener:master Jul 15, 2022
@gardener-robot gardener-robot added the status/closed Issue is closed (either delivered or triaged) label Jul 15, 2022
@timebertt timebertt deleted the controllers-pdb branch July 15, 2022 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking Networking related kind/bug Bug needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) reviewed/lgtm Has approval for merging size/xs Size of pull request is tiny (see gardener-robot robot/bots/size.py) status/closed Issue is closed (either delivered or triaged)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants