-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Select cluster execution when triggering an execution #4956
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
# [RFC] Select cluster execution when triggering an execution | ||
|
||
**Authors:** | ||
|
||
- @RRap0so | ||
- @andresgomezfrr | ||
|
||
## 1 Executive Summary | ||
|
||
When triggering a Flyte LaunchPlan, Workflow or Task, Flyte permits the user to define a ClusterPool from a predefined list of pools that are set in the FlyteAdmin configuration. For example with `flytectl create execution | ||
--clusterPool string` | ||
|
||
This provides greater flexibility compared to the existing feature where an [Execution Cluster Label](https://docs.flyte.org/en/latest/deployment/deployment/multicluster.html#configure-execution-cluster-labels) can be set at the Project/Domain level. | ||
|
||
## 2 Motivation | ||
|
||
The primary motivation for this proposal is to select the clusters where an execution is to be performed dynamically at execution time, without the constant need to overwrite the Execution Cluster Label. | ||
|
||
Additionally, we would like to have the possibility of different executions targeting different clusters, even if they are published under the same Project/Domain. This is not entirely feasible using the current Execution Cluster Label. | ||
|
||
|
||
## 3 Proposed Implementation | ||
|
||
We plan to use the `cluster_assignement` field (refer to [proto](https://github.com/flyteorg/flyte/blob/a2a63eb893bdff3c2d1a01242d0d25b73f287b1f/flyteidl/protos/flyteidl/admin/execution.proto#L315)) in each execution, this is a ClusterAssignment object with a `cluster_pool_name` which would be the field that contains a string with the corresponding ClusterPool ID. | ||
|
||
In the FlyteAdmin configuration we have a list of `clusterpool_config` with all the different available pools and corresponding clusters. | ||
|
||
**Note**: Today there are configurations to assign domains to specific pools, see [PR](https://github.com/flyteorg/flyte/pull/4208) and `cluster_pool` [interface](https://github.com/flyteorg/flyte/blob/962d45827646f106265630a0fb1d9ae74968bc32/flyteadmin/pkg/runtime/interfaces/cluster_pools.go). We propose another configuration to assign Clusters to a clusterPoolName by having another map under cluster object with weights and cluster assignments. | ||
|
||
Proposed configuration: | ||
``` | ||
clusters: | ||
poolNameClusterMap: | ||
pool_1: | ||
- id: flyte-1-k8s | ||
weight: 1 | ||
- id: flyte-2-k8s | ||
weight: 2 | ||
pool_2: | ||
- id: flyte-3-k8s | ||
weight: 6 | ||
- id : flyte-4-k8s | ||
weight: 99 | ||
labelClusterMap: | ||
production: | ||
- id: flyte-1-k8s | ||
weight: 1 | ||
clusterConfigs: | ||
- name: flyte-1-k8s | ||
endpoint: 127.0.0.1 | ||
enabled: true | ||
auth: | ||
type: file_path | ||
tokenPath: /etc/… | ||
certPath: /etc/… | ||
- name: flyte-2-k8s | ||
endpoint: 127.0.0.2 | ||
enabled: true | ||
... | ||
``` | ||
|
||
At the start of FlyteAdmin it will create a new RandomClusterSelector with a `poolWeightedRandomMap`. | ||
|
||
|
||
### When an execution is launched | ||
|
||
Flyteadmin runs the [Execute](https://github.com/flyteorg/flyte/blob/master/flyteadmin/pkg/workflowengine/impl/k8s_executor.go#L36) method and builds the workflow. Now we need to know which Kubernetes Cluster to hit. We will have the pool information available in the ExecutionData that is already passed into the Execute method. | ||
|
||
**If no clusterPool is defined**: No change in current behavior. | ||
|
||
**If clusterPool is defined**: Then we will add the poolName into the [ExecutionTargetSpec](https://github.com/flyteorg/flyte/blob/master/flyteadmin/pkg/executioncluster/execution_target.go#L13-L20) (we will need to modify the struct). This is passed to the getTarget where we check if it has a poolName defined and get the value from the poolWeightedRandomMap. There are still open questions regarding the priority of labels / pool names. | ||
|
||
|
||
## 4 Metrics & Dashboards | ||
|
||
If implemented correctly, we should be able to capture where executions are being triggered. This would be extra useful for our use case to make sure we’re routing executions to the correct pool of clusters | ||
|
||
## 5 Drawbacks | ||
|
||
Other than the usual drawbacks of implementing a new feature (the introduction of new bugs/problems) we don’t see any potential drawbacks with this implementation. The fields are already exposed but they are only missing the implementation. | ||
|
||
We don’t expect any additional overhead for execution or impact in terms of API for flyteadmin. | ||
|
||
## 6 Alternatives | ||
|
||
As mentioned above, the only alternative we have implemented today is using the [Execution Cluster Labels](https://docs.flyte.org/en/latest/deployment/deployment/multicluster.html#configure-execution-cluster-labels). This can bring further problems when we want to route executions to different clusters from the same Project/Domain. | ||
|
||
## 7 Potential Impact and Dependencies | ||
|
||
We do not see any. | ||
|
||
|
||
## 8 Unresolved questions | ||
|
||
What takes priority Labels or Pools? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would vote for labels to take priority as those seem to be more execution specific. I might be missing something though There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes the label usage seems more related to "I need this X type of cluster for this <PROJECT/Domain>". So to be sure, we first check if there's a label set, then under the clusters in that label if there's a pool. If not then default using the weight. (Similar to what is done today.) |
||
Since we have two different ways to route traffic we need to decide what takes priority. | ||
|
||
## 9 Conclusion | ||
|
||
With this we want to bring to the finish line an idea that we believe was going to be done but was not yet implemented. We will reuse as many already existing objects as possible and will be able to smoothly select clusters at a moment's notice which for our use case is extremely important to ensure the timely and reliable execution of LaunchPlans/Workflows/Tasks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said in the contributors' sync:
Love the proposal, my only wish would be to figure out whether there is a way to simplify this config here by not having to configure a cluster twice, once for the existing project/domain mechanism, once for the new mechanism. Would be nice if the cluster configuration could be shared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fg91 I think it should be possible by expanding the
ExecutionParameters
struct. By adding theExecutionClusterLabel
to this struct and making sure we pass it when executing it should make things simpler and avoid having double configs.This will mean a bit of re-writing, since it's late I'll also cross-check my thinking and get back with some certainties next week.