Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate merge-blocking jobs to dedicated cluster: pull-kubernetes-node-e2e #18851

Closed
spiffxp opened this issue Aug 14, 2020 · 6 comments
Closed
Assignees
Labels
area/jobs sig/testing Categorizes an issue or PR as relevant to SIG Testing.

Comments

@spiffxp
Copy link
Member

spiffxp commented Aug 14, 2020

What should be cleaned up or changed:

This is part of #18550

To properly monitor the outcome of this, you should be a member of k8s-infra-prow-viewers@kubernetes.io. PR yourself into https://github.com/kubernetes/k8s.io/blob/master/groups/groups.yaml#L603-L628 if you're not a member.

Migrate pull-kubernetes-node-e2e to k8s-infra-prow-build by adding a cluster: k8s-infra-prow-build field to the job:

NOTE: migrating this job is not as straightforward as some of the other #18550 issues, because we also need to:

  • Switch away from a fixed GCP project to boskos-managed projects
    • Replace --gcp-project=k8s-jkns-pr-node-e2e with --gcp-project-type=gce-project
  • If this turns out to break things, revert and ask for help

Once the PR has merged, note the date/time it merged. This will allow you to compare before/after behavior.

Things to watch for the job

Things to watch for the build cluster

Keep this open for at least 24h of weekday PR traffic. If everything continues to look good, then this can be closed.

/wg k8s-infra
/sig testing
/area jobs
/help

@k8s-ci-robot k8s-ci-robot added wg/k8s-infra sig/testing Categorizes an issue or PR as relevant to SIG Testing. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. area/jobs labels Aug 14, 2020
@spiffxp
Copy link
Member Author

spiffxp commented Aug 19, 2020

/remove-help
/assign

@k8s-ci-robot k8s-ci-robot removed the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Aug 19, 2020
@spiffxp
Copy link
Member Author

spiffxp commented Aug 19, 2020

Opened #18915

@RobertKielty
Copy link
Member

@spiffxp I marked this as In Progress based on #18915 having merged.

When can we call this complete?

@spiffxp
Copy link
Member Author

spiffxp commented Sep 9, 2020

PR merged 2020-08-19, which is too far ago to be able to cleanly show before/after data using testgrid or prow.k8s.io

From a local grafana instance I have that runs queries against k8s-gubernator:build, it looks like the job runs more reliably and with a comparable failure rate under load.
Screen Shot 2020-09-09 at 3 40 48 PM

https://storage.googleapis.com/k8s-gubernator/triage/index.html?date=2020-08-30&pr=1&job=pull-kubernetes-node-e2e

A screenshot of triage from 2020-08-30 is early enough to pick up the before/after performance, and things look no worse that I can see. I'm guessing the spike of failures immediately after is unrelated, or has been corrected since then
Screen Shot 2020-09-09 at 3 43 25 PM

CPU limit usage

CPU limit looks reasonable. As with other jobs, we need most of the CPU up front for building; in the case all the testing cpu usage happens on nodes spun up elsewhere. If we had a shared build we could take the CPU requirements way down.
Screen Shot 2020-09-09 at 3 47 19 PM

Memory limit usage

Same story with memory limit usage
Screen Shot 2020-09-09 at 3 50 28 PM

@spiffxp
Copy link
Member Author

spiffxp commented Sep 9, 2020

/close
I think this is good enough

Apologies for falling behind on this one, it should have been in Monitoring, and I just didn't have time to sit still and check in on it until now.

@k8s-ci-robot
Copy link
Contributor

@spiffxp: Closing this issue.

In response to this:

/close
I think this is good enough

Apologies for falling behind on this one, it should have been in Monitoring, and I just didn't have time to sit still and check in on it until now.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/jobs sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
None yet
Development

No branches or pull requests

3 participants