Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: Task-level (and maybe Pipeline-level) resource requests and limits #4470

Closed
lbernick opened this issue Jan 12, 2022 · 16 comments · Fixed by #5082
Closed

FR: Task-level (and maybe Pipeline-level) resource requests and limits #4470

lbernick opened this issue Jan 12, 2022 · 16 comments · Fixed by #5082
Assignees
Labels
kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature. kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@lbernick
Copy link
Member

lbernick commented Jan 12, 2022

Feature request

Currently, users can set resource requests and limits at the Step level, which are summed to determine the resource requests of the pod that runs a TaskRun. However, some users want to be able to directly specify the resource requests for a Task, rather than per Step.

Related issues

@lbernick lbernick added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 12, 2022
@lbernick
Copy link
Member Author

lbernick commented Mar 9, 2022

/kind design
/assign

@tekton-robot tekton-robot added the kind/design Categorizes issue or PR as related to design. label Mar 9, 2022
@lbernick
Copy link
Member Author

a bit of a wrinkle here. Let's say we want to allow people to specify the total resource requests and limits that should be used by all steps in their task:

  • k8s states that containers without resource limits are considered to have higher limits than those with limits configured (https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#resources)
  • If the task resource limit is applied to only one container, the pod will therefore not have an effective limit.
  • If the task limit is applied to each container, the pod has a much higher limit than desired. This is especially problematic if requests are not set, because the request will then automatically be set to the same value as the limit, and the pod may have difficulty being scheduled.
  • If the task limit is spread out among containers, a task where one step is more resource intensive than all the others could get oomkilled or throttled.
  • resource requirements can't be updated, so we can't dynamically adjust them as steps run.

One way around this is to support only Task-level resource requests, not limits.

Another option is to run Steps as init containers rather than containers (instead of supporting this feature). As a result, k8s will correctly determine the effective pod resource requirements. We would have to rework our entrypoint but I imagine it would be doable. However, we would have no way to support Sidecars.

@dibyom
Copy link
Member

dibyom commented Mar 14, 2022

Another option is to run Steps as init containers rather than containers (instead of supporting this feature). As a result, k8s will correctly determine the effective pod resource requirements. We would have to rework our entrypoint but I imagine it would be doable. However, we would have no way to support Sidecars.

We used to run Tekton in init containers before, init containers have some nice characteristics but also come with a bunch of drawbacks: see #224

@roulettedares
Copy link

imo, #4176 made defining cpu/mem requests much less intuitive. assuming steps in a task will never run in parallel, task-level resource definition would definitely alleviate some of that pain.

@lbernick
Copy link
Member Author

design doc

Must join tekton-dev or tekton-users to view/comment

@lbernick
Copy link
Member Author

/kind tep

@tekton-robot tekton-robot added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Mar 28, 2022
@lbernick
Copy link
Member Author

lbernick commented Apr 8, 2022

Opened TEP-0104 to propose this feature.

@dibyom dibyom added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Apr 25, 2022
@austinzhao-go
Copy link
Contributor

Hi @lbernick, perhaps I could pick this issue up?

digesting the required logic from TEP and target to raise a draft PR in this week.

@lbernick
Copy link
Member Author

lbernick commented May 9, 2022

/assign @austinzhao-go

Thanks Austin!

@lbernick
Copy link
Member Author

I want to amend what I said in #4470 (comment); since the container limits are enforced on individual containers by the container runtime, and pod effective limits seem less relevant than pod effective requests, I've updated the design in tektoncd/community#703 to propose support for Task-level limits as well.

@austinzhao-go
Copy link
Contributor

austinzhao-go commented May 11, 2022

thanks this update @lbernick

just confirm my understanding about the update:

  • task-level limit field will be added and written into step-level (as will pass Pod scheduler check with a higher total amount, but still get enforced by contain-level in runtime)
  • resources field (requests + limits) will be added under 2 positions
    • Task.sepc.resources
    • PipelineRun.TaskRunSpecs.resources (think will overwrite Task.* ones if both specified as for a runtime precedence)
  • sidecar container will be specified separately (from task-level) for resource requirements and by sidecar-wise (so keep as now, and NOT have a resource field under sidecars -- which will mean for all sidecars)

not change:

  • final requirements (after applying task-level resources requirement) will be written into the last-step to take effect.

@lbernick
Copy link
Member Author

Yup that's correct @austinzhao-go !

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 9, 2022
@roulettedares
Copy link

/remove-lifecycle stale
we really need an intuitive way to set requests/limits for the entire pod/task

@tekton-robot tekton-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 9, 2022
@lbernick
Copy link
Member Author

lbernick commented Aug 9, 2022

/lifecycle frozen

This should be ready soon :)

@tekton-robot tekton-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Aug 9, 2022
@lbernick
Copy link
Member Author

This feature is available on main now and will be out in the next release. We won't be implementing Pipeline-level resource requirements for the foreseeable future (it's hard to have a non confusing API for this when some components are running in parallel and some sequentially), although if there's a strong use case for it we can reconsider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature. kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants