Skip to content

Commit

Permalink
[TEP-0044]: Controller role in scheduling TaskRuns
Browse files Browse the repository at this point in the history
This commit addresses the question of whether Tekton should be responsible for determining
which TaskRuns are executed in one pod. It proposes leaving this functionality as an option
for a later iteration of the proposal, and modifies the "TaskGroup" proposal to specify
that for the first implementation, all TaskRuns in a TaskGroup should be run in one pod.
  • Loading branch information
lbernick authored and tekton-robot committed Feb 14, 2022
1 parent 9c3ea19 commit 3fee120
Show file tree
Hide file tree
Showing 2 changed files with 25 additions and 8 deletions.
31 changes: 24 additions & 7 deletions teps/0044-data-locality-and-pod-overhead-in-pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
status: proposed
title: Data Locality and Pod Overhead in Pipelines
creation-date: '2021-01-22'
last-updated: '2022-02-07'
last-updated: '2022-02-09'
authors:
- '@bobcatfish'
- '@lbernick'
Expand Down Expand Up @@ -130,6 +130,8 @@ with Tasks that fetch inputs (e.g. git clone) or push outputs (e.g. docker push)
or even worse, recursion (Task 1 uses Task 2 uses Task 1...)
- Replacing all functionality that was provided by PipelineResources.
See [TEP-0074](./0074-deprecate-pipelineresources.md) for the deprecation plan for PipelineResources.
- Building functionality into Tekton to determine which Tasks should be combined together, as opposed to letting a user configure this.
We can explore providing this functionality in a later iteration of this proposal.

### Use Cases

Expand Down Expand Up @@ -280,6 +282,19 @@ We could work around this limitation via a few options:
2. Not allowing Steps within hermetic Tasks to communicate with each other.
3. Requiring that hermetic Tasks not execute in parallel with other Tasks run in the same pod.

### Controller role in scheduling TaskRuns
Some solutions to this problem involve allowing a user to configure which TaskRuns they would like to be executed on one pod,
and some solutions allow the controller to determine which TaskRuns should be executed on one pod.

For example, if we decide to create a TaskGroup abstraction, we could decide that all Tasks in a TaskGroup should be executed on the same
pod, or that the controller gets to decide how to schedule TaskRuns in a TaskGroup. Similarly, we could provide an option to execute
a Pipeline in a pod, or an option to allow the PipelineRun controller to determine which TaskRuns should be grouped.

We should first tackle the complexity of running multiple TaskRuns on one pod before tackling the complexity of determining
which TaskRuns should be scheduled together. A first iteration of this proposal should require the user to specify when they would like
TaskRuns to be combined together. After experimentation and user feedback, we can explore providing an option that would rely on the
controller to make this decision.

### Additional Design Considerations
- Executing an entire Pipeline in a pod, as compared to executing multiple Tasks in a pod, may pave the way for supporting
[local execution](https://github.com/tektoncd/pipeline/issues/235).
Expand Down Expand Up @@ -795,18 +810,20 @@ In this approach we create a new Tekton type called a "TaskGroup", which can be
TaskGroups may be embedded in Pipelines. We could create a new TaskGroup controller or use the existing TaskRun controller
to schedule a TaskGroup.

The controller would be responsible for creating one TaskRun per Task in the TaskGroup, and determining
at runtime how to schedule each TaskRun. For example, it could create a pod and schedule all the TaskRuns on it,
or, if a single pod running all the Tasks is too large to be scheduled, it could split the TaskRuns between multiple pods.
The controller would be responsible for reconciling both the TaskGroup and the TaskRuns created from the TaskGroup.
We could introduce configuration options to specify whether the controller should attempt to split up TaskRuns
or simply fail if a single pod wouldn't be schedulable.
The controller would be responsible for creating one TaskRun per Task in the TaskGroup, and scheduling each of these
TaskRuns in the same pod. The controller would be responsible for reconciling both the TaskGroup and the TaskRuns
created from the TaskGroup.

The controller would need to determine how many TaskRuns are needed when the TaskGroup is first reconciled, due to
[limitations associated with dynamically creating Tasks](#dynamically-created-tasks-in-pipelines).
When the TaskGroup is first reconciled, it would create all TaskRuns needed, with those that are not ready to execute marked as "pending",
and a pod with one container per TaskRun. The TaskGroup would store references to any TaskRuns created, and Task statuses would be stored on the TaskRuns.

In a future version of this solution, we could explore allowing the TaskGroup/TaskRun controller to determine how to schedule TaskRuns.
For example, it could create a pod and schedule all the TaskRuns on it, or, if a single pod running all the Tasks is too large
to be scheduled, it could split the TaskRuns between multiple pods. We could introduce configuration options to specify whether
the controller should attempt to split up TaskRuns or simply fail if a single pod wouldn't be schedulable.

Pros:
* Creating a single TaskRun for each Task would allow individual Task statuses to be surfaced separately.
* Allows us to choose which Pipeline features to support, and marks a clear distinction for users between supported and unsupported features.
Expand Down
2 changes: 1 addition & 1 deletion teps/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ This is the complete list of Tekton teps:
|[TEP-0040](0040-ignore-step-errors.md) | Ignore Step Errors | implemented | 2021-08-11 |
|[TEP-0041](0041-tekton-component-versioning.md) | Tekton Component Versioning | implementable | 2021-04-26 |
|[TEP-0042](0042-taskrun-breakpoint-on-failure.md) | taskrun-breakpoint-on-failure | implemented | 2021-12-10 |
|[TEP-0044](0044-data-locality-and-pod-overhead-in-pipelines.md) | Data Locality and Pod Overhead in Pipelines | proposed | 2022-02-07 |
|[TEP-0044](0044-data-locality-and-pod-overhead-in-pipelines.md) | Data Locality and Pod Overhead in Pipelines | proposed | 2022-02-09 |
|[TEP-0045](0045-whenexpressions-in-finally-tasks.md) | WhenExpressions in Finally Tasks | implemented | 2021-06-03 |
|[TEP-0046](0046-finallytask-execution-post-timeout.md) | Finally tasks execution post pipelinerun timeout | implemented | 2021-12-14 |
|[TEP-0047](0047-pipeline-task-display-name.md) | Pipeline Task Display Name | proposed | 2021-02-10 |
Expand Down

0 comments on commit 3fee120

Please sign in to comment.