-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(resourcequotas): Update namespace-specific hard quota calculation logic #1088
Conversation
… logic Signed-off-by: Lukas Boettcher <1340215+lukasboettcher@users.noreply.github.com>
✅ Deploy Preview for capsule-documentation canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great! i have tested your fork and i dont think it breaks any other functionality. However i would like @prometherion opinion on this as well. However it does not entirely fix the problem described in the issue.
This looks smart and elegant, wondering why we didn't think about it. @oliverbaehler I remember Resource Quotas at the Tenant scope have been our Achilles' heel, if the proposed changes are preventing the overflow, it's a huge +1 for me. |
May I ask you to elaborate a bit more? I suspect we have a small time window where the hard quota is not enforced, isn't it? |
It does not directly, well at least i am still able to overflow namespaces with the concexutive of two scale commands. Tenant Quota:
oil-dev:
oil-test:
But this calculation used here is more consistent, so it's a step towards the right direction. So we somehow need to cover that case, where at the same time a lot of scaling or resource updates are happening and our controller is to slow to updated. One way to prevent this racing conditions, is to have a webhook, which calculates the resources requested for any resource which is being created or updated. As you have already pointed out in earlier comments. Essentially we need this function here: Big questionmark on the performance impact if we implement it like this. But i dont think we get around the point, that we need a validatingwebhook which validates all the objects, the question is, what's happening in the webhook function. I was also thinking, if we should implement something like a locking mechanism, so that when resource quotas are updated or synced we lock them. |
I can reproduce the problem with quick consecutive scaling or when creating resources in separate namespaces in a single request for object count quotas. For compute resource quotas this seems to hold up. However I believe that this is just the case because of the order / timing these quotas are updated and evaluated, so there is definitely a race condition. |
Intends to fix #49