Task parallelism and Regional clusters - supported? #3563

jlpettersson · 2020-11-25T23:01:08Z

Opening this for discussions.

Task Parallelism - should Tekton support parallel Tasks?

Parallel Tasks in a Pipeline is a way to get faster feedback - do more in shorter time. Most pipeline products support parallel Tasks. This is trivial in single-node solutions, but more tricky in cluster solutions. Another thing is that tasks likely need to share files with workspace functionality.

Original issue: #2586

Run Pipelines in regional clusters - should Tekton support it?

Cloud environments is growing in popularity and it is getting more important with "Availability" - organizations is running its workload in regional clusters. In addition, the role of CI/CD is getting more important (ref e.g. Accelerate book) - it is now also used to provision infrastructure using "Infrastructure as Code" - organizations is getting more dependent on "pipeline systems". A problem with Tasks in a regional cluster is if they mount more than one PVC and the PVCs are not located in the same AZ.

Problem issue: #2540

Notes

Now the main problem with the two above things, is the use of Kubernetes Persistent Volumes

The current solution for task parallelism is the Affinity Assistant, but a better solution is a custom scheduler #3052 - work has started in https://github.com/tektoncd/experimental/tree/master/scheduler
The idea with both these implementations is: to schedule the pods that share PVC to the node where the PVC is.
That idea limits the use of PVCs for Tasks to 1 (unless e.g. #3559 is implemented), which also avoids the problem with running tasks in a regional cluster.
Other ideas are welcome.

Some assumptions that has been made:

ReadWriteOnce access mode is the far most common, see PV Access Modes
There are users of Tekton that want to run its workload in regional clusters
Storage Classes that are synchronously replicated to other AZs is rarely used - if used, they are twice as expensive.

The text was updated successfully, but these errors were encountered:

imjasonh · 2020-11-26T03:03:29Z

My two cents:

Task Parallelism - should Tekton support parallel Tasks?

Tekton cannot guarantee parallel execution of Tasks. It should whenever possible attempt to schedule two Pods such that they run at the same time, but many things outside its control (Kubernetes' scheduler, available resources, etc.) can prevent parallel execution. Scheduling PVCs is a prime example, but also available CPU and RAM, a namespace's pod quota, and others, are outside Tekton's control. So Tekton should never guarantee that two Tasks run in parallel.

(As an aside, Tekton should guarantee concurrent tasks -- that is, that two tasks be scheduled to run at the same time, whether or not constraints of the underlying systems prevent them from actually executing "in parallel". Rob Pike has a good talk about the difference between parallelism and concurrency, FWIW: https://vimeo.com/49718712)

Run Pipelines in regional clusters - should Tekton support it?

Yes, as well as it can. Tekton shouldn't preclude regional clusters, or a priori any other future exotic cluster configuration that might come up.

I think AA was a step forward, but ultimately its limitations make it an incomplete solution. I have some optimism about the custom scheduler, but I think there's more exploratory work needed there to improve it to the point we can recommend it to users, or even bundle it into Tekton proper. There might very well be other options we haven't considered, and it's possible a custom scheduler isn't sufficient either.

jlpettersson · 2020-11-26T07:54:22Z

(As an aside, Tekton should guarantee concurrent tasks -- that is, that two tasks be scheduled to run at the same time, whether or not constraints of the underlying systems prevent them from actually executing "in parallel". Rob Pike has a good talk about the difference between parallelism and concurrency, FWIW: https://vimeo.com/49718712)

What I mean with parallel Tasks is in the Tasks within a Pipeline graph - even though the processes may execute concurrently. See this Pipeline with parallel tasks (time goes from left to right - the way to draw "concurrency" is to draw the tasks in "parallel"):

        (b)
      / 
  (a) 
      \
        (c)

If Task (b) and (c) run concurrently in most cases - I would say that Tekton support parallel Tasks execution (as seen in the Pipeline graph).
If Task (b) and (c) is enforced to run in a sequence in most cases - I would say that Tekton does not support parallel Tasks ("parallel" as seen in the Pipeline graph). It is only supported in the pipeline graph - but the user is "tricked" about "parallel" task execution.

imjasonh · 2020-11-26T12:58:20Z

Because Tekton can't guarantee (b) and (c) run in parallel, we should not make any guarantees that it can.

Parallelism is like caching the result of an expensive operation in this case. It's an optimization we should to make whenever it's beneficial, but if a user comes to rely on it for correctness or as a necessary performance improvement, that's going to be a problem. Unfortunately it's hard to tell in some cases whether you secretly depend on cached state until it's not available. 😅

We should set user expectations that even though a Pipeline specifies concurrent tasks, and Tekton will try to run them in parallel, we can't guarantee it. So if your tasks require each other to be running at the same time (e.g., task A runs a container that task B connects to), or if your pipeline requires both tasks to run in parallel to finish under a deadline (e.g., two 9-minute tasks in a pipeline with a 10-minute timeout), then Tekton won't be able to guarantee that your pipeline can succeed.

jlpettersson · 2020-11-26T16:26:43Z

Some more sources about why users might want to use parallel tasks

A hot topic amongst continuous delivery pipelines is parallelism: doing a lot of work, perhaps as much as you can with the resources you have, at the same time. In some cases that is triggering jobs and allowing them to run concurrently, other times it is running parts of your pipeline in parallel with the aim of getting feedback sooner and making the most of your resources.

Optimizing your pipeline with parallelism
Reasons you may want to use parallelism to get faster results:

You have an abundance of elastic build workers you can distribute parts of your work over
You have really powerful build machines with lots of CPUs you can utilize
Your pipeline has a lot of IO "waiting"
Your pipeline has integration tests that depend on (slow) external services
When you visualize your pipeline, parts may leap out that you realize you can do in parallel. This may happen as you plan out your pipeline, or perhaps when you are watching things execute and are bothered by hotspots where things seem to slow down

From https://www.cloudbees.com/blog/parallelism-and-distributed-builds-jenkins

Quick feedback in regards to build times is important in Continuous Integration. If builds become too
long, it can hurt the rate of software development. There are multiple methods to reduce build times.
One commonly suggested method is to parallelize builds.

Since the CI build task is recurring, the total time spent on waiting for it to
complete can become a significant amount. Reducing the total build time will increase the potential
productivity as more time is freed up [2]. One widely suggested approach to reduce build times is to
run tests in parallel in the CI software. When dividing the work of a task into 2 parallel subtasks, the
time it takes is in best case cut in half

The literature agrees on the importance of short build times in Continuous Integration. One broadly
suggested approach to shorten build times is to parallelize builds.

From The Effects of Parallelizing Builds in Continuous Integration Software

I haven't made any statements about guarantees, I think this is more about providing "good support" or "poor support" for parallel tasks.

ghost · 2020-12-16T17:34:13Z

Task Parallelism - should Tekton support parallel Tasks?

Run Pipelines in regional clusters - should Tekton support it?

I think the answer to both questions is "yes (and it already does)". I understand that there's a really tricky problem that users currently have to work around because of the platform (PVCs in regional clusters) but that doesn't preclude them from either a) modelling parallel Tasks in their Pipelines or b) from running concurrent Tasks in a regional cluster.

I think this is more about providing "good support" or "poor support" for parallel tasks.

Yeah I think I generally agree with this perspective. And to be honest I think we could bump up our level of support a lot just by clearly and concisely documenting the problems users with PVCs in regional clusters will face, and some solutions - manual solutions - they can take to work around those problems. So, for example, document use of nodeSelector / affinity to tie TaskRuns in a PipelineRun to specific Nodes. This would be in addition to the existing docs we provide describing the behaviour of the Affinity Assistant.

I think this issue is also pushing on something slightly different to just binary "good"/"poor" support too. The questions I keep coming back to are: "how much code and complexity should Pipelines throw at this problem?" and "how opinionated should Pipelines' solutions be?". I'm going to dwell on these questions a bit over the holidays but something that I think about a lot is how solutions to these problems will affect the many groups that use Tekton: Not just folks operating Tekton Pipelines clusters, but also folks building third party tools on top of it, the people writing entire platforms on top of Tekton's components, and the people who are integrating with or conforming to the API without leveraging Tekton's components. There is a really broad set of people and companies with really different requirements for how persistent storage should be handled for their org / product / platform.

Anyway, at the very least I think there's a massive documentation gap here currently and we could more clearly and concisely document how the choice of PVC affects parallelism and how it can be worked around.

jlpettersson · 2021-01-24T20:53:35Z

I am closing this in favor for the Design doc: Task parallelism when using workspace

jlpettersson mentioned this issue Nov 25, 2020

Towards v1 API #3548

Closed

17 tasks

vdemeester added the kind/question Issues or PRs that are questions around the project or a particular feature label Nov 30, 2020

bobcatfish mentioned this issue Dec 7, 2020

taskruns will stay in Running(Pending) state if non existing pvc name is given to workspace #2994

Open

jlpettersson closed this as completed Jan 24, 2021

jlpettersson mentioned this issue Jan 30, 2021

Support custom TopologyKey in Affinity Assistant #3731

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task parallelism and Regional clusters - supported? #3563

Task parallelism and Regional clusters - supported? #3563

jlpettersson commented Nov 25, 2020 •

edited

Loading

imjasonh commented Nov 26, 2020

jlpettersson commented Nov 26, 2020 •

edited

Loading

imjasonh commented Nov 26, 2020

jlpettersson commented Nov 26, 2020

ghost commented Dec 16, 2020

jlpettersson commented Jan 24, 2021

Task parallelism and Regional clusters - supported? #3563

Task parallelism and Regional clusters - supported? #3563

Comments

jlpettersson commented Nov 25, 2020 • edited Loading

Task Parallelism - should Tekton support parallel Tasks?

Run Pipelines in regional clusters - should Tekton support it?

Notes

imjasonh commented Nov 26, 2020

jlpettersson commented Nov 26, 2020 • edited Loading

imjasonh commented Nov 26, 2020

jlpettersson commented Nov 26, 2020

ghost commented Dec 16, 2020

jlpettersson commented Jan 24, 2021

jlpettersson commented Nov 25, 2020 •

edited

Loading

jlpettersson commented Nov 26, 2020 •

edited

Loading