Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8s ResourceQuota does not work correctly with Step Limits #4976

Open
skaegi opened this issue Jun 14, 2022 · 4 comments
Open

K8s ResourceQuota does not work correctly with Step Limits #4976

skaegi opened this issue Jun 14, 2022 · 4 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@skaegi
Copy link
Contributor

skaegi commented Jun 14, 2022

We are trying to use LimitRange and ResourceQuota to restrict Tekton resource usage by customers. In particular we want to "limit" their memory and cpu use to prevent Tekton work loads from hogging resources others need.

The Tekton controller tries to be clever here with resource requests spread evenly across steps but for limits this approach doesn't work as the container runtimes use the limits when defining the upper hard limit on resource usage in each container. The net result is that each step retains its full limit and the sum is used when trying to schedule which can cause problems when a ResourceQuota is in play.

With steps there is a well-known impedance mismatch with how Kubernetes wants to run containers. Tekton works around some of the issues by still running the containers in parallel but then using the Tekton entrypoint to serialize execution.
I've been hoping we might get first-class "Sequences" in Kubernetes but unfortunately does not look like we're every going to get . An alternate runtime approach in Tekton that might be worth considering is to implement the Steps in a single steps container as this would less us put all step resource usage in one place and might resolve this but I suspect that ship has sailed some time ago and of course likely introduces a new set of problems.

So... we might want to call this out as a Limitation for Tekton here as Steps will always request the sum of Limits and this has consequences. We currently only use "Requests" and avoid using "Limits" in our LimitRanges and ResourceQuota altogether when using Resources with Tekton. Unfortunately that means we need additional out-of-band mechanisms to manage resource use.

@skaegi skaegi added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 14, 2022
@skaegi
Copy link
Contributor Author

skaegi commented Jun 22, 2022

I take this all back. This is a limitation when running Tekton with Kata Containers. I'm going to work there to see if I can remove this problem as it is more general and applies to any workload running in Kata.

@skaegi skaegi closed this as completed Jun 22, 2022
@skaegi
Copy link
Contributor Author

skaegi commented Jun 28, 2022

/reopen
Touched up title and description slightly. This issue is still very relevant when trying to use ResourceQuotas and Limits in a namespace where running Tekton Pipelines and Tasks.

@tekton-robot
Copy link
Collaborator

@skaegi: Reopened this issue.

In response to this:

/reopen
Touched up title and description slightly. This issue is still very relevant when trying to use ResourceQuotas and Limits

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot reopened this Jun 28, 2022
@lbernick lbernick changed the title Step Limitation with Limits K8s ResourceQuota does not work correctly with Step Limits Jun 30, 2022
@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 28, 2022
@lbernick lbernick added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
Status: Todo
Development

No branches or pull requests

3 participants