adding in priority value in podSpec for controller, scheduler and apiserver #1631

erleene · 2019-06-18T15:07:27Z

We have found a bug in 1.13.x where mirror pods get evicted when the node is under pressure. This PR sets the priority directly so that those pods won't get evicted. We don't believe that this is required with 1.14+.

Relates to this issue kubernetes/kubernetes#73572

k8s-ci-robot · 2019-06-18T15:07:29Z

Welcome @erleene!

It looks like this is your first PR to kubernetes-incubator/kube-aws 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-incubator/kube-aws has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2019-06-18T15:07:32Z

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: login-issues@jira.linuxfoundation.org

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

codecov-io · 2019-06-18T15:45:57Z

Codecov Report

Merging #1631 into v0.13.x will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff            @@
##           v0.13.x    #1631   +/-   ##
========================================
  Coverage    25.49%   25.49%           
========================================
  Files           98       98           
  Lines         5049     5049           
========================================
  Hits          1287     1287           
  Misses        3619     3619           
  Partials       143      143

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 289cd68...9d45c31. Read the comment docs.

davidmccormick · 2019-06-18T17:19:48Z

Please test before we merge

… scheduler and apiserver

dominicgunn · 2019-06-19T14:38:06Z

This looks to have been fixed for Kubernetes 1.13.5+, the default for kube-aws is now 1.13.7 so this shouldn't impact newly created clusters.

Kubernetes upstream has fixed this for versions as far back as v1.11.x, do we want to do the same in kube-aws?

davidmccormick · 2019-06-19T21:05:47Z

Yes, that has got us confused somewhat as we observed the behavior on a 1.13.7 cluster without this work-around! It shouldn’t happen.

erleene · 2019-06-20T11:21:51Z

@dominicgunn

The problem still occurs in 1.13.7, where no controllers are running and getting evicted. We have the api server and the scheduler up, and kubelet shows the following:

 11:01:46.328766    1525 certificate_manager.go:378] Certificate request was not signed: timed out waiting for the condition
11:01:46.328804    1525 certificate_manager.go:269] Reached backoff limit, still unable to rotate certs: timed out waiting for the condition
11:10:49.829455    1525 log.go:172] http: multiple response.WriteHeader calls
11:14:54.907091    1525 certificate_manager.go:378] Certificate request was not signed: timed out waiting for the condition

  some more interesting output:

10:49:05.306713    1525 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get https://127.0.0.1/api/v1/services?li>
10:49:05.307641    1525 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://127.0.0.1/api/v1/pods?fi>
10:49:05.308780    1525 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:453: Failed to list *v1.Node: Get https://127.0.0.1/api/v1/nodes?fieldSel>

dominicgunn · 2019-06-20T11:38:36Z

Does this only happen in clusters created with kube-aws, or is it a problem in upstream k8s?

davidmccormick · 2019-06-20T11:49:25Z

Thanks, it was also confirmed that the controller-manager pods had been evicted - this is an upstream kubernetes issue which may also affect other releases, e.g. 1.14.x, 1.15.x (although it is supposed to have been patched). It affects kube-aws customers deploying smaller controllers that are pushed near or over capacity and won't affect well sized nodes. What it has shown us though is that when resources are under pressure some of the most critical components were chosen for eviction.

I suggest we merge this as a work-around on 1.13.x and test the behaviour on 1.14.x and apply similar work-around if observed on that branch.

Good work Erleen!

dominicgunn · 2019-06-20T11:51:51Z

Cool, i'm behind that. Code looks good.

dominicgunn · 2019-06-20T11:51:56Z

/approve

k8s-ci-robot · 2019-06-20T11:52:10Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dominicgunn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [dominicgunn]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

davidmccormick · 2019-06-20T11:53:19Z

/lgtm

davidmccormick · 2019-06-20T11:53:55Z

Thanks for your first contribution! 🙏

k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Jun 18, 2019

k8s-ci-robot requested review from mumoshu and redbaron June 18, 2019 15:07

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 18, 2019

erleene force-pushed the bugfix/mirror-pod-priorities branch 2 times, most recently from 7880864 to fd364ab Compare June 18, 2019 15:50

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jun 18, 2019

davidmccormick changed the title ~~adding in priority value in podSpec for controller, scheduler and apiserver~~ WIP: adding in priority value in podSpec for controller, scheduler and apiserver Jun 19, 2019

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 19, 2019

adding in priority value for scheduler, controller, scheduler as a fix

4581ead

erleene force-pushed the bugfix/mirror-pod-priorities branch from fd364ab to 4581ead Compare June 19, 2019 10:30

k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 19, 2019

adding in priority calss name on top of priority value to controller,…

9d45c31

… scheduler and apiserver

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 19, 2019

k8s-ci-robot assigned redbaron Jun 20, 2019

davidmccormick assigned davidmccormick and unassigned redbaron Jun 20, 2019

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 20, 2019

davidmccormick changed the title ~~WIP: adding in priority value in podSpec for controller, scheduler and apiserver~~ adding in priority value in podSpec for controller, scheduler and apiserver Jun 20, 2019

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 20, 2019

k8s-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 20, 2019

davidmccormick merged commit ca91b7f into kubernetes-retired:v0.13.x Jun 20, 2019

davidmccormick added this to the v0.13.0-rc.3 milestone Jun 20, 2019

erleene deleted the bugfix/mirror-pod-priorities branch June 20, 2019 12:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding in priority value in podSpec for controller, scheduler and apiserver #1631

adding in priority value in podSpec for controller, scheduler and apiserver #1631

erleene commented Jun 18, 2019 •

edited by davidmccormick

Loading

k8s-ci-robot commented Jun 18, 2019

k8s-ci-robot commented Jun 18, 2019

codecov-io commented Jun 18, 2019 •

edited

Loading

davidmccormick commented Jun 18, 2019

dominicgunn commented Jun 19, 2019

davidmccormick commented Jun 19, 2019

erleene commented Jun 20, 2019 •

edited

Loading

dominicgunn commented Jun 20, 2019

davidmccormick commented Jun 20, 2019 •

edited

Loading

dominicgunn commented Jun 20, 2019

dominicgunn commented Jun 20, 2019

k8s-ci-robot commented Jun 20, 2019

davidmccormick commented Jun 20, 2019

davidmccormick commented Jun 20, 2019

adding in priority value in podSpec for controller, scheduler and apiserver #1631

adding in priority value in podSpec for controller, scheduler and apiserver #1631

Conversation

erleene commented Jun 18, 2019 • edited by davidmccormick Loading

k8s-ci-robot commented Jun 18, 2019

k8s-ci-robot commented Jun 18, 2019

codecov-io commented Jun 18, 2019 • edited Loading

Codecov Report

davidmccormick commented Jun 18, 2019

dominicgunn commented Jun 19, 2019

davidmccormick commented Jun 19, 2019

erleene commented Jun 20, 2019 • edited Loading

dominicgunn commented Jun 20, 2019

davidmccormick commented Jun 20, 2019 • edited Loading

dominicgunn commented Jun 20, 2019

dominicgunn commented Jun 20, 2019

k8s-ci-robot commented Jun 20, 2019

davidmccormick commented Jun 20, 2019

davidmccormick commented Jun 20, 2019

erleene commented Jun 18, 2019 •

edited by davidmccormick

Loading

codecov-io commented Jun 18, 2019 •

edited

Loading

erleene commented Jun 20, 2019 •

edited

Loading

davidmccormick commented Jun 20, 2019 •

edited

Loading