Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify quota changes needed for scalability jobs, create pool of scalability projects #851

Closed
spiffxp opened this issue May 7, 2020 · 10 comments
Labels
area/prow Setting up or working with prow in general, prow.k8s.io, prow build clusters sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.

Comments

@spiffxp
Copy link
Member

spiffxp commented May 7, 2020

The default quotas for an e2e project (eg: k8s-infra-e2e-gce-project) are insufficient to run ci-kubernetes-e2e-gci-gce-scalability

Currently this job runs in the google.com k8s-prow-builds cluster, using a project from that boskos' scalability-project pool

  • Identify what quotas are set that make these projects differ from a default e2e project
  • Identify what jobs this will allow us to migrate
  • Identify pool size
  • Provision pool in CNCF org
  • Add pool to k8s-infra-prow-build's boskos
@spiffxp spiffxp added sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. wg/k8s-infra area/prow Setting up or working with prow in general, prow.k8s.io, prow build clusters labels May 7, 2020
@spiffxp
Copy link
Member Author

spiffxp commented May 7, 2020

I created k8s-infra-e2e-scale-project and tried manually running ci-kubernetes-e2e-gci-gce-scalability in k8s-infra-prow-build (with some local modifications not to dump results to gs://kubernetes-jenkins)

It got stuck on CPU quota

I was able to retrieve project info for k8s-periodic-scale-1, so I compared with k8s-infra-e2e-scale-project globally and in us-east1. CPUs and in-use addresses were bumped up to 125 in us-east1.

I filed a request to raise quota for k8s-infra-e2e-scale-project with the following info:

OSS Kubernetes scalability testing, 100 node clusters

https://testgrid.k8s.io/sig-release-master-blocking#gce-cos-master-scalability-100

Goal is to migrate OSS Kubernetes tests to use GCP projects under CNCF-owned kubernetes.io org instead of google.com org

Trying to match quotas of google.com project "k8s-periodic-scale-1". Anticipate 10 projects like this.

The quota got bumped during the job run, and the job was able to complete successfully.

Open questions:

  • do quotas differ in other regions?
  • what other jobs benefit from this?

@spiffxp
Copy link
Member Author

spiffxp commented May 7, 2020

Jobs that use scalability-project

100 node release-blocking tests (definitely migrate these)

100 node kubemark tests (these look like CI variants of merge-blocking jobs so make room for them)

experiment tests (I would hold off on these for now)

@spiffxp
Copy link
Member Author

spiffxp commented May 7, 2020

I did a visual inspection of this with all regions, so filter down to just the two that had meaningful differences

for p in $(<~/w/kubernetes/test-infra/config/prow/cluster/boskos-resources.yaml yq -r '.resources[] | select (.type=="scalability-project").names[]'); do 
  echo $p...; 
  diff --ignore-all-space \
    <(gcloud compute regions list --project=$p)
    <(gcloud compute regions list --project=k8s-infra-e2e-gce-project) |\
  grep -E "us-east1|us-central1|---"
done
  • 16/16 had at least 125 CPU and 125 in-use addresses in us-east1, e.g.
  • 2/16 had additional CPU quota in us-central1
  • 2/16 had 250 CPU instead of 125
  • 4/16 had 10240 instead of default 40960 disk
  • 4/16 had 50000 instead of default 40960 disk

So I'm inclined to suggest we stick with 125 CPU / in-use addresses for us-east1

The existing pool is 16 projects, but there are 5 jobs I'm not sure we want to move over yet. So I'm going to setup 10 projects

@spiffxp
Copy link
Member Author

spiffxp commented May 11, 2020

Blocked until #852 is resolved

@spiffxp
Copy link
Member Author

spiffxp commented May 26, 2020

No longer blocked. I've created a pool of 5 projects to start with via changes in #898

@spiffxp
Copy link
Member Author

spiffxp commented Jul 31, 2020

With an eye toward: kubernetes/test-infra#18550

Based on visual inspection of https://monitoring.prow.k8s.io/d/wSrfvNxWz/boskos-resource-usage?orgId=1&from=now-90d&to=now

Currently k8s-prow-build's boksos has two pools:

  • scalability project: 15 total, peak usage of ~5
  • scalability presubmit projects: 45 total, peak usage of ~30

Currently k8s-infra-prow-build has:

  • scalability project: 5 total, peak usage of ~3

It's not clear to me whether the presubmit projects have different quotas than the regular projects

@spiffxp
Copy link
Member Author

spiffxp commented Aug 30, 2020

Added a canary job via kubernetes/test-infra#19049

https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/92316/pull-kubernetes-e2e-gce-100-performance-canary/1299860480084414464/ - confirmed that the existing scalability-project pool will work for presubmits

Based on visual inspection of https://monitoring.prow.k8s.io/d/wSrfvNxWz/boskos-resource-usage?orgId=1&from=now-90d&to=now, after removing a kubemark presubmit out of the set of merge-blocking jobs for kubernetes (ref: kubernetes/test-infra#18788)

k8s-prow-build's boksos has two pools:

  • scalability project: 15 total, peak usage of ~5
  • scalability presubmit projects: 45 total, peak usage of ~17

k8s-infra-prow-build has:

  • scalability project: 5 total, peak usage of ~3

Going to provision scalability pool up to 30 projects (5+17+3=25 + 5 overhead)

@spiffxp
Copy link
Member Author

spiffxp commented Aug 30, 2020

Opened #1192 to grow the pool

@spiffxp
Copy link
Member Author

spiffxp commented Oct 6, 2020

/close
Closing because it looks like we've got enough capacity for now. Specifically, there's enough green leftover in the "scalability project (k8s-infra)" pool to account for usage of the other two scalability project pools. This issue covers provisioning the projects, not migration of the jobs

Screen Shot 2020-10-06 at 3 48 06 PM

@k8s-ci-robot
Copy link
Contributor

@spiffxp: Closing this issue.

In response to this:

/close
Closing because it looks like we've got enough capacity for now. Specifically, there's enough green leftover in the "scalability project (k8s-infra)" pool to account for usage of the other two scalability project pools. This issue covers provisioning the projects, not migration of the jobs

Screen Shot 2020-10-06 at 3 48 06 PM

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/prow Setting up or working with prow in general, prow.k8s.io, prow build clusters sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.
Projects
None yet
Development

No branches or pull requests

2 participants