-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task parallelism and Regional clusters - supported? #3563
Comments
My two cents:
Tekton cannot guarantee parallel execution of Tasks. It should whenever possible attempt to schedule two Pods such that they run at the same time, but many things outside its control (Kubernetes' scheduler, available resources, etc.) can prevent parallel execution. Scheduling PVCs is a prime example, but also available CPU and RAM, a namespace's pod quota, and others, are outside Tekton's control. So Tekton should never guarantee that two Tasks run in parallel. (As an aside, Tekton should guarantee concurrent tasks -- that is, that two tasks be scheduled to run at the same time, whether or not constraints of the underlying systems prevent them from actually executing "in parallel". Rob Pike has a good talk about the difference between parallelism and concurrency, FWIW: https://vimeo.com/49718712)
Yes, as well as it can. Tekton shouldn't preclude regional clusters, or a priori any other future exotic cluster configuration that might come up. I think AA was a step forward, but ultimately its limitations make it an incomplete solution. I have some optimism about the custom scheduler, but I think there's more exploratory work needed there to improve it to the point we can recommend it to users, or even bundle it into Tekton proper. There might very well be other options we haven't considered, and it's possible a custom scheduler isn't sufficient either. |
What I mean with parallel Tasks is in the Tasks within a Pipeline graph - even though the processes may execute concurrently. See this Pipeline with parallel tasks (time goes from left to right - the way to draw "concurrency" is to draw the tasks in "parallel"):
|
Because Tekton can't guarantee Parallelism is like caching the result of an expensive operation in this case. It's an optimization we should to make whenever it's beneficial, but if a user comes to rely on it for correctness or as a necessary performance improvement, that's going to be a problem. Unfortunately it's hard to tell in some cases whether you secretly depend on cached state until it's not available. 😅 We should set user expectations that even though a Pipeline specifies concurrent tasks, and Tekton will try to run them in parallel, we can't guarantee it. So if your tasks require each other to be running at the same time (e.g., task A runs a container that task B connects to), or if your pipeline requires both tasks to run in parallel to finish under a deadline (e.g., two 9-minute tasks in a pipeline with a 10-minute timeout), then Tekton won't be able to guarantee that your pipeline can succeed. |
Some more sources about why users might want to use parallel tasks
From https://www.cloudbees.com/blog/parallelism-and-distributed-builds-jenkins
From The Effects of Parallelizing Builds in Continuous Integration Software I haven't made any statements about guarantees, I think this is more about providing "good support" or "poor support" for parallel tasks. |
I think the answer to both questions is "yes (and it already does)". I understand that there's a really tricky problem that users currently have to work around because of the platform (PVCs in regional clusters) but that doesn't preclude them from either a) modelling parallel Tasks in their Pipelines or b) from running concurrent Tasks in a regional cluster.
Yeah I think I generally agree with this perspective. And to be honest I think we could bump up our level of support a lot just by clearly and concisely documenting the problems users with PVCs in regional clusters will face, and some solutions - manual solutions - they can take to work around those problems. So, for example, document use of I think this issue is also pushing on something slightly different to just binary "good"/"poor" support too. The questions I keep coming back to are: "how much code and complexity should Pipelines throw at this problem?" and "how opinionated should Pipelines' solutions be?". I'm going to dwell on these questions a bit over the holidays but something that I think about a lot is how solutions to these problems will affect the many groups that use Tekton: Not just folks operating Tekton Pipelines clusters, but also folks building third party tools on top of it, the people writing entire platforms on top of Tekton's components, and the people who are integrating with or conforming to the API without leveraging Tekton's components. There is a really broad set of people and companies with really different requirements for how persistent storage should be handled for their org / product / platform. Anyway, at the very least I think there's a massive documentation gap here currently and we could more clearly and concisely document how the choice of PVC affects parallelism and how it can be worked around. |
I am closing this in favor for the Design doc: Task parallelism when using workspace |
Opening this for discussions.
Task Parallelism - should Tekton support parallel Tasks?
Parallel Tasks in a
Pipeline
is a way to get faster feedback - do more in shorter time. Most pipeline products support parallel Tasks. This is trivial in single-node solutions, but more tricky in cluster solutions. Another thing is that tasks likely need to share files with workspace functionality.Original issue: #2586
Run Pipelines in regional clusters - should Tekton support it?
Cloud environments is growing in popularity and it is getting more important with "Availability" - organizations is running its workload in regional clusters. In addition, the role of CI/CD is getting more important (ref e.g. Accelerate book) - it is now also used to provision infrastructure using "Infrastructure as Code" - organizations is getting more dependent on "pipeline systems". A problem with Tasks in a regional cluster is if they mount more than one PVC and the PVCs are not located in the same AZ.
Problem issue: #2540
Notes
Now the main problem with the two above things, is the use of Kubernetes Persistent Volumes
The current solution for task parallelism is the Affinity Assistant, but a better solution is a custom scheduler #3052 - work has started in https://github.com/tektoncd/experimental/tree/master/scheduler
The idea with both these implementations is: to schedule the pods that share PVC to the node where the PVC is.
That idea limits the use of PVCs for Tasks to 1 (unless e.g. #3559 is implemented), which also avoids the problem with running tasks in a regional cluster.
Other ideas are welcome.
Some assumptions that has been made:
ReadWriteOnce
access mode is the far most common, see PV Access ModesThe text was updated successfully, but these errors were encountered: