From d1d8a0b27225f97f67daf1622fc3f97ba8a3f07e Mon Sep 17 00:00:00 2001 From: Jerop Date: Wed, 6 Oct 2021 14:33:11 -0400 Subject: [PATCH] TEP-0090: Looping [Problem Statement] This change adds the problem statement for Looping. It scopes the problem, describes the use cases, and identifies the goals, the requirements for the solution, and related work in other continuous delivery systems. Today, users cannot supply varying `Parameters` to the same `Task` or `Custom Task` - that is, fan out their `Task` or `Custom Tasks`. In this TEP, we aim to provide a way to run the same `Task` or `Custom Task` with varying `Parameters` by spinning up a `TaskRun` or `Run` for each `Parameter` in a loop. This looping construct is aimed at improving the composability, scalability, flexibility and reusability of *Tekton Pipelines*. References: - [Task Loops Experimental Project][task-loops] - Issues: - https://github.com/tektoncd/pipeline/issues/2050 - https://github.com/tektoncd/pipeline/issues/4097 [task-loops]: https://github.com/tektoncd/experimental/tree/main/task-loops --- teps/0090-looping.md | 425 +++++++++++++++++++++++++++++++++++++++++++ teps/README.md | 1 + 2 files changed, 426 insertions(+) create mode 100644 teps/0090-looping.md diff --git a/teps/0090-looping.md b/teps/0090-looping.md new file mode 100644 index 000000000..e3e1b0c40 --- /dev/null +++ b/teps/0090-looping.md @@ -0,0 +1,425 @@ +--- +status: proposed +title: Looping +creation-date: '2021-10-13' +last-updated: '2021-10-13' +authors: +- '@jerop' +- '@pritidesai' +--- + +# TEP-0090: Looping + + +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) + - [Use Cases](#use-cases) + - [Parallel Kaniko Build](#parallel-kaniko-build) + - [Dynamic Parallel Docker Build](#dynamic-parallel-docker-build) + - [Fan Out Vault Reading](#fan-out-vault-reading) + - [Multiple Testing Strategies](#multiple-testing-strategies) + - [Test Sharding](#test-sharding) + - [Requirements](#requirements) + - [Related Work](#related-work) + - [GitHub Actions](#github-actions) + - [Argo Workflows](#argo-workflows) + - [Ansible](#ansible) +- [References](#references) + + +## Summary + +Today, users cannot supply varying `Parameters` to the same `Task` or `Custom Task` - that is, fan out their `Task` or +`Custom Tasks`. In this TEP, we aim to provide a way to run the same `Task` or `Custom Task` with varying `Parameters` +by spinning up a `TaskRun` or `Run` for each `Parameter` in a loop. This looping construct is aimed at improving the +composability, scalability, flexibility and reusability of *Tekton Pipelines*. + +## Motivation + +Users can specify `Parameters`, such as an artifacts' names, that they want to supply to [`Tasks`][tasks-docs] and +[`Custom Tasks`][custom-tasks-docs] at execution. However, they don't have a way to supply varying `Parameters` to +the same `Task` or `Custom Task`. + +Today, users would have to duplicate that `Task` or `Custom Task` in the `Pipelines` specification as many times as the +number of varying `Parameters` that they want to pass in. This creates some limitations and challenges: +- It is tedious and does not scale well because users have to add a `Task` entry to handle an additional *Parameter*. +- It is error-prone when duplicating the `Tasks` specifications, and it may be challenging to debug those errors. +- It is not flexible enough to handle a dynamic set of `Parameters` making it less reusable. + +A common scenario is [a user needs to build multiple images][kaniko-example-1] from one repository using the +[kaniko][kaniko-task] `Task` from the *Tekton Catalog*. Let's assume it's three images. The user would have to specify +that `Pipeline` with the kaniko `Task` duplicated, as such: + +```yaml +apiVersion: tekton.dev/v1beta1 +kind: Pipeline +metadata: + name: kaniko-pipeline +spec: + workspaces: + - name: shared-workspace + params: + - name: image-1 + description: reference of the first image to build + - name: image-2 + description: reference of the second image to build + - name: image-3 + description: reference of the third image to build + tasks: + - name: fetch-repository + taskRef: + name: git-clone + workspaces: + - name: output + workspace: shared-workspace + params: + - name: url + value: https://github.com/tektoncd/pipeline + - name: subdirectory + value: "" + - name: deleteExisting + value: "true" + - name: kaniko-1 + taskRef: + name: kaniko + runAfter: + - fetch-repository + workspaces: + - name: source + workspace: shared-workspace + params: + - name: IMAGE + value: $(params.image-1) + - name: kaniko-2 + taskRef: + name: kaniko + runAfter: + - fetch-repository + workspaces: + - name: source + workspace: shared-workspace + params: + - name: IMAGE + value: $(params.image-2) + - name: kaniko-3 + taskRef: + name: kaniko + runAfter: + - fetch-repository + workspaces: + - name: source + workspace: shared-workspace + params: + - name: IMAGE + value: $(params.image-3) +``` + +As shown in the above example, the limitations and challenges include: +- the user would have to add another `Task` entry if we need to build another image. +- the user can easily make errors while duplicating the `Tasks` specifications. +- the `Pipeline` cannot handle a dynamic set of images making it less reusable. + +The `Parameters` used in the above example are user-defined. In some cases, the `Parameter` may be the `Result` of a +previous `Task` in the `Pipeline`. For example, a user [needs to build a dynamic set of images][kaniko-example-2] and +they share their current experience: + > "Right now I'm doing all of this by just having a statically defined single `Pipeline` with a `Task` and then + delegating to code/loops within that single `Task` to achieve the `N` things I want to do. This works, but then + I'd prefer the concept of a single Task does a single thing, rather than overloading it like this. Especially + when viewing it in the dashboard etc, things get lost" ~ [bitsofinfo][kaniko-example-2] + +We need to address these challenges and limitations to improve the composability, scalability, flexibility, +reusability and debuggability of *Tekton Pipelines*. + +**In this TEP, we aim to provide a way to run the same `Task` or `Custom Task` with varying `Parameters` by spinning up +a `TaskRun` or `Run` for each `Parameter` in a loop. This looping construct is aimed at improving the composability, +scalability, flexibility and reusability of *Tekton Pipelines***. + +### Goals + +- Executing `Tasks` and `Custom Tasks` in a loop with varying `Parameter` values. +- Configuring whether the `TaskRuns` and `Runs` created in the loop execute sequentially or parallelly. + +### Non-Goals + +- Terminating early when the `Tasks` or `Custom Tasks` are executed parallely - in-progress `TaskRuns` and `Runs` have +to complete execution before termination. +- Ignoring a failure when the `Tasks` or `Custom Tasks` are executed sequentially - addressed in [TEP-0050][tep-0050]. + +### Use Cases + +#### Parallel Kaniko Build + +As a `Pipeline` author, I [need to build multiple images][kaniko-example-1] from one repository using the same `Task`. +I choose to use the [*kaniko*][kaniko-task] `Task` from the *Tekton Catalog*. Let's assume it's three images. I want to +pass in varying `Parameter` values for `IMAGE` to create three `TaskRuns`, one for each image. + +``` + clone + | + v + -------------------------------------------------- + | | | + v v v + ko-build-image-1 ko-build-image-2 ko-build-image-3 +``` + +In other circumstances, the `Parameter` values for `IMAGE` may be produced by a previous `Task` in the `Pipeline` +instead of supplying them myself. + +Read more in [user experience report #1][kaniko-example-1] and [user experience report #2][kaniko-example-2]. + +#### Dynamic Parallel Docker Build + +As a `Pipeline` author, I have several dockerfiles in my repository. + +``` +/ docker / Dockerfile + python / Dockerfile + Ubuntu / Dockerfile +... +``` + +I have a *clone* `Task` that fetches the repository to a shared `Workspace`. Then I have a *get-dir* `Task` that +produces a `Result` array with the directory names of the dockerfiles. Finally, I want to dynamically generate the +parallel *docker build* `Tasks` that gets each dockerfile and runs docker build and push. + +``` + clone + | + v + get-dir + | + v + -------------------------------------------------- + | | | + v v v + docker-build-1 docker-build-2 docker-build-3 +``` + +Read more in the [user experience report][docker-example]. + +#### Fan Out Vault Reading + +As a `Pipeline` author, I have a file in my repository with several vault paths. + +```text +path1 +path2 +path3 +... +``` + +I have a *vault-read* `Task` that I need to run for every entry in the file and get the secrets in each of them. +As such, I need to fan out the *vault-read* `Task` N times, where N is the number of vault paths in my file. + +``` + clone + | + v + get-vault-paths + | + v + -------------------------------------------------- + | | | + v v v + vault-read-1 vault-read-2 vault-read-3 +``` + +Read more in the [user experience report][vault-example]. + +#### Multiple Testing Strategies + +As a `Pipeline` author, I have several a file configuring the test types that I want to run. + +```text +code-analysis +unit-tests +e2e-tests +... +``` + +I have a *test* `Task` that I need to run for each test type in the file - the `Task` runs tests based on a `Parameter`. +I need to run this *test* `Task` for multiple test types that are defined in my repository (fetched using the +*test-selector* `Task`). + +``` + clone + | + v + tests-selector + | + v + -------------------------------------------------- + | | | + v v v + code-analysis unit-tests e2e-tests +``` + +#### Test Sharding + +As a `Pipeline` author, I have a large test suite that's slow (e.g. browser based tests) and I need to speed it up. +I need to split up the test suite into groups, run the tests separately, then combine the results. + +``` +[ +[test_a, test_b], +[test_c, test_d], +[test_e, test_f], +] +``` + +I choose to use the [Golang Test][golang-test] `Task` from the *Tekton Catalog*. Let's assume we've updated it to +support running a subset of tests. Then, I have the *test-sharding* `Task` that divided the tests across shards, +and produces a list of names of tests each shard should run. + +``` + clone + | + v + tests-sharding + | + v + -------------------------------------------------- + | | | + v v v + test-ab test-cd test-ef +``` + + +### Requirements + +- User should be able to pass in an array `Parameter` to a `Task` or `Custom Task` and generate as many `TaskRuns` or +`Runs` as the length of the array `Parameter`. +- Users should be able to pass in several array `Parameters` to a `Task` or `Custom Task` and generate as many `TaskRuns` +or `Runs` as the combinations of the array `Parameters`. +- Users should be able to configure whether the loop is executed sequentially or parallelly. +- Users should be able to control the concurrency limit (maximum `TaskRuns` or `Runs` executed at a time). + +### Related Work + +The looping construct is related to `for loops` which are available in most programming languages. In this section, we +explore related work on looping constructs in other continuous delivery systems. + +#### GitHub Actions + +GitHub Actions allows users to define a matrix of job configurations - which creates jobs with after substituting +variables in each job. It also allows users to include or exclude combinations in the build matrix. + +For example: + +```yaml +runs-on: ${{ matrix.os }} +strategy: + matrix: + os: [macos-latest, windows-latest, ubuntu-18.04] + node: [8, 10, 12, 14] + exclude: + # excludes node 8 on macOS + - os: macos-latest + node: 8 + include: + # includes node 15 on ubuntu-18.04 + - os: ubuntu-18.04 + node: 15 +``` + +GitHub Actions workflows syntax also allows users to: +- cancel in-progress jobs is one of the matrix jobs fails +- specify maximum number of jobs to run in parallel + +Read more in [documentation][github-actions]. + +#### Argo Workflows + +Argo Workflows allows users to iterate over: +- a list of items as static inputs +- a list of sets of items as static inputs +- parameterized list of items or list of sets of items +- dynamic list of items or lists of sets of items + +Here's an example from the [documentation][argo-workflows]: +```yaml +apiVersion: argoproj.io/v1alpha1 +kind: Workflow +metadata: + generateName: loops-param-result- +spec: + entrypoint: loop-param-result-example + templates: + - name: loop-param-result-example + steps: + - - name: generate + template: gen-number-list + # Iterate over the list of numbers generated by the generate step above + - - name: sleep + template: sleep-n-sec + arguments: + parameters: + - name: seconds + value: "{{item}}" + withParam: "{{steps.generate.outputs.result}}" + + # Generate a list of numbers in JSON format + - name: gen-number-list + script: + image: python:alpine3.6 + command: [python] + source: | + import json + import sys + json.dump([i for i in range(20, 31)], sys.stdout) + + - name: sleep-n-sec + inputs: + parameters: + - name: seconds + container: + image: alpine:latest + command: [sh, -c] + args: ["echo sleeping for {{inputs.parameters.seconds}} seconds; sleep {{inputs.parameters.seconds}}; echo done"] +``` + +Read more in the [documentation][argo-workflows]. + +#### Ansible + +Ansible allows users to execute a task multiple times using `loop`, `with_` and `until` keywords. + +For example: + +```yaml +- name: Show the environment + ansible.builtin.debug: + msg: " The environment is {{ item }} " + loop: + - staging + - qa + - production +``` + +Read more in the [documentation][ansible]. + +## References + +- [Task Loops Experimental Project][task-loops] +- Issues: + - [#2050: `Task` Looping inside `Pipelines`][issue-2050] + - [#4097: List of `Results` of a `Task`][issue-4097] + +[task-loops]: https://github.com/tektoncd/experimental/tree/main/task-loops +[issue-2050]: https://github.com/tektoncd/pipeline/issues/2050 +[issue-4097]: https://github.com/tektoncd/pipeline/issues/4097 +[tasks-docs]: https://github.com/tektoncd/pipeline/blob/main/docs/tasks.md +[custom-tasks-docs]: https://github.com/tektoncd/pipeline/blob/main/docs/pipelines.md#using-custom-tasks +[kaniko-example-1]: https://github.com/tektoncd/pipeline/issues/2050#issuecomment-625423085 +[kaniko-task]: https://github.com/tektoncd/catalog/tree/main/task/kaniko/0.5 +[kaniko-example-2]: https://github.com/tektoncd/pipeline/issues/2050#issuecomment-671959323 +[docker-example]: https://github.com/tektoncd/pipeline/issues/2050#issuecomment-814847519 +[vault-example]: https://github.com/tektoncd/pipeline/issues/2050#issuecomment-841291098 +[tep-0050]: https://github.com/tektoncd/community/blob/main/teps/0050-ignore-task-failures.md +[argo-workflows]: https://github.com/argoproj/argo-workflows/blob/7684ef4a0c5f57e8723dc8e4d3a17246f7edc2e6/examples/README.md#loops +[github-actions]: https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions +[ansible]: https://docs.ansible.com/ansible/latest/user_guide/playbooks_loops.html#loops +[golang-test]: https://github.com/tektoncd/catalog/tree/main/task/golang-test/0.2 diff --git a/teps/README.md b/teps/README.md index d0c9f84e2..86ead0d4d 100644 --- a/teps/README.md +++ b/teps/README.md @@ -225,3 +225,4 @@ This is the complete list of Tekton teps: |[TEP-0081](0081-add-chains-subcommand-to-the-cli.md) | Add Chains sub-command to the CLI | proposed | 2021-08-31 | |[TEP-0084](0084-endtoend-provenance-collection.md) | end-to-end provenance collection | proposed | 2021-09-16 | |[TEP-0085](0085-per-namespace-controller-configuration.md) | Per-Namespace Controller Configuration | proposed | 2021-10-14 | +|[TEP-0090](0090-looping.md) | Looping | proposed | 2021-10-13 |