Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task parallelism and Regional clusters - supported? #3563

Closed
jlpettersson opened this issue Nov 25, 2020 · 6 comments
Closed

Task parallelism and Regional clusters - supported? #3563

jlpettersson opened this issue Nov 25, 2020 · 6 comments
Labels
kind/question Issues or PRs that are questions around the project or a particular feature

Comments

@jlpettersson
Copy link
Member

jlpettersson commented Nov 25, 2020

Opening this for discussions.

Task Parallelism - should Tekton support parallel Tasks?

Parallel Tasks in a Pipeline is a way to get faster feedback - do more in shorter time. Most pipeline products support parallel Tasks. This is trivial in single-node solutions, but more tricky in cluster solutions. Another thing is that tasks likely need to share files with workspace functionality.

Original issue: #2586

Run Pipelines in regional clusters - should Tekton support it?

Cloud environments is growing in popularity and it is getting more important with "Availability" - organizations is running its workload in regional clusters. In addition, the role of CI/CD is getting more important (ref e.g. Accelerate book) - it is now also used to provision infrastructure using "Infrastructure as Code" - organizations is getting more dependent on "pipeline systems". A problem with Tasks in a regional cluster is if they mount more than one PVC and the PVCs are not located in the same AZ.

Problem issue: #2540

Notes

Now the main problem with the two above things, is the use of Kubernetes Persistent Volumes

The current solution for task parallelism is the Affinity Assistant, but a better solution is a custom scheduler #3052 - work has started in https://github.com/tektoncd/experimental/tree/master/scheduler
The idea with both these implementations is: to schedule the pods that share PVC to the node where the PVC is.
That idea limits the use of PVCs for Tasks to 1 (unless e.g. #3559 is implemented), which also avoids the problem with running tasks in a regional cluster.
Other ideas are welcome.

Some assumptions that has been made:

  • ReadWriteOnce access mode is the far most common, see PV Access Modes
  • There are users of Tekton that want to run its workload in regional clusters
  • Storage Classes that are synchronously replicated to other AZs is rarely used - if used, they are twice as expensive.
@jlpettersson jlpettersson mentioned this issue Nov 25, 2020
17 tasks
@imjasonh
Copy link
Member

My two cents:

Task Parallelism - should Tekton support parallel Tasks?

Tekton cannot guarantee parallel execution of Tasks. It should whenever possible attempt to schedule two Pods such that they run at the same time, but many things outside its control (Kubernetes' scheduler, available resources, etc.) can prevent parallel execution. Scheduling PVCs is a prime example, but also available CPU and RAM, a namespace's pod quota, and others, are outside Tekton's control. So Tekton should never guarantee that two Tasks run in parallel.

(As an aside, Tekton should guarantee concurrent tasks -- that is, that two tasks be scheduled to run at the same time, whether or not constraints of the underlying systems prevent them from actually executing "in parallel". Rob Pike has a good talk about the difference between parallelism and concurrency, FWIW: https://vimeo.com/49718712)

Run Pipelines in regional clusters - should Tekton support it?

Yes, as well as it can. Tekton shouldn't preclude regional clusters, or a priori any other future exotic cluster configuration that might come up.

I think AA was a step forward, but ultimately its limitations make it an incomplete solution. I have some optimism about the custom scheduler, but I think there's more exploratory work needed there to improve it to the point we can recommend it to users, or even bundle it into Tekton proper. There might very well be other options we haven't considered, and it's possible a custom scheduler isn't sufficient either.

@jlpettersson
Copy link
Member Author

jlpettersson commented Nov 26, 2020

(As an aside, Tekton should guarantee concurrent tasks -- that is, that two tasks be scheduled to run at the same time, whether or not constraints of the underlying systems prevent them from actually executing "in parallel". Rob Pike has a good talk about the difference between parallelism and concurrency, FWIW: https://vimeo.com/49718712)

What I mean with parallel Tasks is in the Tasks within a Pipeline graph - even though the processes may execute concurrently. See this Pipeline with parallel tasks (time goes from left to right - the way to draw "concurrency" is to draw the tasks in "parallel"):

        (b)
      / 
  (a) 
      \
        (c)
  • If Task (b) and (c) run concurrently in most cases - I would say that Tekton support parallel Tasks execution (as seen in the Pipeline graph).

  • If Task (b) and (c) is enforced to run in a sequence in most cases - I would say that Tekton does not support parallel Tasks ("parallel" as seen in the Pipeline graph). It is only supported in the pipeline graph - but the user is "tricked" about "parallel" task execution.

@imjasonh
Copy link
Member

Because Tekton can't guarantee (b) and (c) run in parallel, we should not make any guarantees that it can.

Parallelism is like caching the result of an expensive operation in this case. It's an optimization we should to make whenever it's beneficial, but if a user comes to rely on it for correctness or as a necessary performance improvement, that's going to be a problem. Unfortunately it's hard to tell in some cases whether you secretly depend on cached state until it's not available. 😅

We should set user expectations that even though a Pipeline specifies concurrent tasks, and Tekton will try to run them in parallel, we can't guarantee it. So if your tasks require each other to be running at the same time (e.g., task A runs a container that task B connects to), or if your pipeline requires both tasks to run in parallel to finish under a deadline (e.g., two 9-minute tasks in a pipeline with a 10-minute timeout), then Tekton won't be able to guarantee that your pipeline can succeed.

@jlpettersson
Copy link
Member Author

Some more sources about why users might want to use parallel tasks

A hot topic amongst continuous delivery pipelines is parallelism: doing a lot of work, perhaps as much as you can with the resources you have, at the same time. In some cases that is triggering jobs and allowing them to run concurrently, other times it is running parts of your pipeline in parallel with the aim of getting feedback sooner and making the most of your resources.

Optimizing your pipeline with parallelism
Reasons you may want to use parallelism to get faster results:

You have an abundance of elastic build workers you can distribute parts of your work over
You have really powerful build machines with lots of CPUs you can utilize
Your pipeline has a lot of IO "waiting"
Your pipeline has integration tests that depend on (slow) external services
When you visualize your pipeline, parts may leap out that you realize you can do in parallel. This may happen as you plan out your pipeline, or perhaps when you are watching things execute and are bothered by hotspots where things seem to slow down

From https://www.cloudbees.com/blog/parallelism-and-distributed-builds-jenkins

Quick feedback in regards to build times is important in Continuous Integration. If builds become too
long, it can hurt the rate of software development. There are multiple methods to reduce build times.
One commonly suggested method is to parallelize builds.

Since the CI build task is recurring, the total time spent on waiting for it to
complete can become a significant amount. Reducing the total build time will increase the potential
productivity as more time is freed up [2]. One widely suggested approach to reduce build times is to
run tests in parallel in the CI software. When dividing the work of a task into 2 parallel subtasks, the
time it takes is in best case cut in half

The literature agrees on the importance of short build times in Continuous Integration. One broadly
suggested approach to shorten build times is to parallelize builds.

From The Effects of Parallelizing Builds in Continuous Integration Software

I haven't made any statements about guarantees, I think this is more about providing "good support" or "poor support" for parallel tasks.

@vdemeester vdemeester added the kind/question Issues or PRs that are questions around the project or a particular feature label Nov 30, 2020
@ghost
Copy link

ghost commented Dec 16, 2020

Task Parallelism - should Tekton support parallel Tasks?

Run Pipelines in regional clusters - should Tekton support it?

I think the answer to both questions is "yes (and it already does)". I understand that there's a really tricky problem that users currently have to work around because of the platform (PVCs in regional clusters) but that doesn't preclude them from either a) modelling parallel Tasks in their Pipelines or b) from running concurrent Tasks in a regional cluster.

I think this is more about providing "good support" or "poor support" for parallel tasks.

Yeah I think I generally agree with this perspective. And to be honest I think we could bump up our level of support a lot just by clearly and concisely documenting the problems users with PVCs in regional clusters will face, and some solutions - manual solutions - they can take to work around those problems. So, for example, document use of nodeSelector / affinity to tie TaskRuns in a PipelineRun to specific Nodes. This would be in addition to the existing docs we provide describing the behaviour of the Affinity Assistant.

I think this issue is also pushing on something slightly different to just binary "good"/"poor" support too. The questions I keep coming back to are: "how much code and complexity should Pipelines throw at this problem?" and "how opinionated should Pipelines' solutions be?". I'm going to dwell on these questions a bit over the holidays but something that I think about a lot is how solutions to these problems will affect the many groups that use Tekton: Not just folks operating Tekton Pipelines clusters, but also folks building third party tools on top of it, the people writing entire platforms on top of Tekton's components, and the people who are integrating with or conforming to the API without leveraging Tekton's components. There is a really broad set of people and companies with really different requirements for how persistent storage should be handled for their org / product / platform.

Anyway, at the very least I think there's a massive documentation gap here currently and we could more clearly and concisely document how the choice of PVC affects parallelism and how it can be worked around.

@jlpettersson
Copy link
Member Author

I am closing this in favor for the Design doc: Task parallelism when using workspace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Issues or PRs that are questions around the project or a particular feature
Projects
None yet
Development

No branches or pull requests

3 participants