-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tenant Resource Quota admission controller doesn't block resources creation #49
Comments
I'm aware of this behavior and it's a viable trade-off, although not entirely satisfying. I'm still trying to figure out how to apply these constraints without reinventing the wheel and ensuring the hard quota is absolutely honored. Right now, Capsule is setting the Tenant quotas to each Namespace handled by it and watching for events: once the sum of the used resources across all the Namespace resources is exceeded, we're blocking them as you noticed. Using this poor trick works well since we don't need a further CRD, and no need to listen on multiple Validating webhooks since there could be tons of resources handled by Quota ones. Regarding #14, I guess it's not related: as you said, the update is processed correctly. Let's keep this one open and try to figure out in the future how to implement a better strategy. |
Hi @prometherion, I think it's going to be important to address this problem in a future release. While it may be viable for use to scope a tenant quota to eg. hosts which kinda hard enforce the tenant quota, we should honour the resource stretched on a tenant. I am happy to develop this with you. The last call was, that we are looking for new ideas how to solve, while I am certainly not experienced enough to come up with a smart idea, I asked ChatGPT what it thinks about the problem. Here are it's best thoughts: Here are a few ideas for how you might address these issues: idea 3 seems interesting. tell me more
Do you think something similar is achievable without a dedicated controller? We tried to overcome this odd scenario by creating a validation webhook to stop the update of the ResourceQuota object, thinking this would have prevented the odd scenario It is possible to implement a queueing system for resource allocation without using a dedicated controller, but it will likely be more complex and may not provide the same level of control and guarantees as a dedicated controller. Already some interesting approaches. I ll have some more conversations and see what comes around.. :) |
All the proposals suggested by ChatGPT are valuable and not effective for a production use case since missing the knowledge of the underlying internals of Kubernetes. The problem with the following issue is the fact that a The basic idea behind Capsule has been simplicity and reusing the same Kubernetes API primitives, trying to deliver new ones at minimum reducing the cognitive load in operating a multi-tenant environment. There are just two concrete ways to address this:
This is cumbersome and repetitive, also considering that there's already an implementation, although non-Tenant aware.
Essentially, it would be pretty similar to ResourceQuota but living at the cluster-scope level, with the same specification, but with a label selector that we could use to address the Tenant namespaces. The latter one is the most reasonable solution to me, although requiring to deal with the community and finding its consensus, besides making the fix compatible only starting from the release where the new resource is going to land. If you'd like to help me in drafting a KEP eager to work together, as well as attend the community meeting. |
Just read about SchedulingGates as new feature gate: https://kubernetes.io/blog/2022/12/26/pod-scheduling-readiness-alpha/ This could be interesting. In the use case the mention this:
I will have to play around with this feature a bit, maybe delaying scheduling might already help. But i obv. would also love to create a KEP :3 ! |
Yeah, this could be interesting, although remember that we're using ResourceQuota for multiple resources like PVC, Secret, PV, ConfigMap and, in the end, Pods. |
I just created a PR #1088 that might help with this issue and #643. By subtracting the resource usage of all other namespaces from the desired tenant-level hard quota, instead of assigning the full tenant-level hard quota on every rq object, we prevent the creation of resources that would go over the tenants quota. I have tested the proposed change somewhat locally, but if this might cause some problems elsewhere that I am not aware of, feel free to disregard the proposal. |
I propose to reopen this issue @oliverbaehler @prometherion to keep track of the racing condition we still have, even though with the smart PR by @lukasboettcher we now have a considerable improvement against this issue. I guess that as long as we don't have an admission webhook there's space for racing. WDYT? |
Yeah ofc, just merged the feat to keep complexity low. Me and Dario have talked about a possible solution. I have time this afternoon to verify if thats working |
Bug description
Tenant Resource Quota admission controller doesn't block resources creation
How to reproduce
As cluster admin:
A tenant owner:
check used quota at tenant level
scale up pods
and check quota:
Expected behavior
Tenant Resource Quota admission controller should block resource creation crossing the set quota.
Logs
Capsule controller has been started with following args but I do see only INFO
Additional context
capsule --version
)kubectl version
)The text was updated successfully, but these errors were encountered: