-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node e2e tests - projects out of quota #17714
Comments
/kind cleanup |
/cc @ZhiFeng1993 |
/assign karan |
Also addresses exhaustion. |
Looking at the following: All of them use the same project and the same region (us-west1-b). All of these run with n1-standard-1 and it seems like in the last couple of hours, it's been constantly at 100 CPU usage. An easy fix would be to spread the tests over other regions as well (us-central1 has another 100 CPU quota.) |
I would confirm whether those instances are from live running tests, or whether they're detritus left over from tests that couldn't ssh to the nodes they'd created and forgot to cleanup. I LGTM'ed the PR to try a new region, if the reason is the latter, you may hit quota again. See kubernetes/kubernetes#89892 (comment) where we tested this out by clearing out VM's for a project that appeared to have network issues |
Seems like the quota issue is resolved and I'm seeing 4-6 VMs in the project right now. So doesn't seem like VMs are being orphaned. The new issue issue is around docker not running (for which I'll cut a new issue). |
I'm not sure this is quite finished, for https://testgrid.k8s.io/sig-node-containerd#pull-node-e2e ``
|
We should cut a new issue IMO and check all projects we use and their IP quota. If we allow up to 100 VMs in a project, we should have at least that many IPs in quota as well. |
I do not have any idea of how to do this project checking. |
We should get guidance from @spiffxp and @BenTheElder |
Aaron and I don't have much to do with this project currently, when SIGs ask for GCE quota we ask them to use the boskos pools to rent a project for one test run at a time, which makes cleanup very straightforward (destroy all project contents) and capacity planning easy (monitor how full / empty the pools are). Using a single project is a bit of an anti-pattern. |
So we need to figure out why it's tied to a specific gcp project, and figure out a way to nullify that requirement. Does that sound right? |
And move to using a boskos pool for (any?/all?) tests that currently have a specific project/region? Cause yea, it would make sense to me that we don't care where these things run, just that they do. |
Many sig-node tests are failing consistently due to CPU exhaustion:
They are tracked in https://docs.google.com/spreadsheets/d/1mEU8B2_PmMwwgp-_xnyp7QYMBwcLoA9NNlHwDyMvO0Y/edit#gid=0.
For one of the projects, looks like it's at 100 instances already:
A good starting point will be to:
The text was updated successfully, but these errors were encountered: