[Feature][master] When deployed on k8s and yarn, dolphinscheduler supports worker elastic scaling #9337

huagetai · 2022-04-02T16:28:38Z

Search before asking

I had searched in the issues and found no similar feature requirement.

Description

When deployed on cluster managers such as k8s and yarn, dolphinscheduler supports worker elastic scaling
The initial idea is as follows：

Add ResourceManager to the master server.
ResourceManager is responsible for the management and application of worker resources.
ResourceManager perceives the online and offline of worker nodes through the register.
The ResourceManager goes to the cluster manager to apply for resources according to the per workflow. The resources are released at the end of the workflow.

Use case

worker elastic scaling

Related issues

No response

Are you willing to submit a PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

github-actions · 2022-04-02T16:29:08Z

Hi:

Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
If you haven't received a reply for a long time, you can subscribe to the developer's email，Mail subscription steps reference https://dolphinscheduler.apache.org/en-us/community/development/subscribe.html ,Then write the issue URL in the email content and send question to dev@dolphinscheduler.apache.org.

caishunfeng · 2022-04-13T01:53:06Z

The ResourceManager goes to the cluster manager to apply for resources according to the per workflow. The resources are released at the end of the workflow.

Hi @huagetai , in ds, master handle workflow/process and worker handle task, so what about managing resources for pre task?

davidzollo · 2022-06-12T13:09:18Z

hi, any progress?

davidzollo · 2022-06-18T13:48:05Z

in this picture, I want to know that how to handle the worker when the task finished in the K8s scene?

caishunfeng · 2022-06-20T04:25:02Z

hi, any progress?

It seems no progress update. @davidzollo you can try if you are interested.

caishunfeng · 2022-06-20T04:29:48Z

in this picture, I want to know that how to handle the worker when the task finished in the K8s scene?

I think there're two mode: worker per pod, task per pod, the issue description should be the second one.

EricGao888 · 2022-12-06T06:56:59Z

Hi @huagetai , may I ask whether there will be any follow-ups for this issue? Thanks : )

EricGao888 · 2022-12-06T07:02:16Z

BTW, I think we may not put ResourceManager into master. We need more discussions on this point.

Radeity · 2022-12-17T11:20:12Z

BTW, I think we may not put ResourceManager into master. We need more discussions on this point.

Hi, @EricGao888, i think ds should only get involved in executors resource management in standalone mode. BTW, i think we should keep the conception of Master and Worker, and introduce new conception of Executor in charge of actual task execution, like design in Spark.

In current design, many types of task directly runs on worker. I think ds' Master and Worker have to focus on scheduling, only in standalone mode, Worker is in charge of computing resource management and launch Executor. In other modes like DS on K8S / Yarn, Master can directly launch remote Executor in external K8S or Yarn cluster which in charge of resource management now.

In a word, this could be a big architectural change. Anyway, just a humble suggestion, feel free to correct me, i'm interested in any further discussion!

hdygxsj · 2022-12-21T13:35:14Z

In this Feature, I would like to know how the following three questions are designed

For timed workflow, when will their pod be created? Because api server takes a certain amount of time to create the pod, will the task start time be delayed
For resource allocation, if PODS are created dynamically based on workflow or task, how do dolphinscheduler specify the number of cpus and memory required by each pod? Is it specified when workflow is created?
When deploying dolphinscheduler, whether dynamic worker creation is optional and whether the old deployment mode is retained, such as dolphinscheduler on k8s and dolphinscheduler on native k8s

EricGao888 · 2022-12-27T16:19:11Z

n other modes like DS on K8S / Yarn, Master can directly launch remote Executor in external K8S or Yarn cluster which in charge of resource management now.

@Radeity Thanks for being interested. Actually we haven't decided how to do it yet.

In other modes like DS on K8S / Yarn, Master can directly launch remote Executor in external K8S or Yarn cluster which in charge of resource management now. I'm +1 to this point.

DS should only get involved in executors resource management in standalone mode. For this, I think we do not need to spend time thinking about how to manage executor resource in standalone mode since standalone mode is only for development and experience, not for production.

EricGao888 · 2023-01-03T05:03:32Z

In this Feature, I would like to know how the following three questions are designed

For timed workflow, when will their pod be created? Because api server takes a certain amount of time to create the pod, will the task start time be delayed

For resource allocation, if PODS are created dynamically based on workflow or task, how do dolphinscheduler specify the number of cpus and memory required by each pod? Is it specified when workflow is created?

When deploying dolphinscheduler, whether dynamic worker creation is optional and whether the old deployment mode is retained, such as dolphinscheduler on k8s and dolphinscheduler on native k8s

@hdygxsj Good questions.

I think we could make use of K8S HPA with the number of queued and running tasks as the metric. DS will scale the worker once the number of queued and running tasks reaches a threshold. In this way, when task get dispatched, the workers has already been running and there will be no latency.
In the design of elastic worker, we could follow one worker one pod instead of dynamically creating pod based on task, which is another thing stated in [Feature][Executor] Add K8S Executor for task dispatching #13316 K8S Executor.
IMHO, we could try to make things, e.g. executor, extensible and pluggable. Users will be able to choose which to use. In this way, the design will be compatible.

EricGao888 · 2023-01-03T05:06:41Z

related: #13316

leehom · 2024-02-27T04:14:05Z

我最近在类似的东西，弹性资源组件，整个概念基于flink的集群与资源，声明式资源管理

资源是组件的核心数据，分两条线a线，b线
4资源请求->5a 分配可用资源-> 6a 请求使用资源-> 7a 提供资源->8a 提交任务
4资源请求->5b 分配待定资源-> 6b 请求新worker-> 7b 启动任务管理器->8b 注册/报告资源
a线是分配现有资源；b线请求新资源，新资源注册后是现有资源，在a线分配

huagetai added feature new feature Waiting for reply Waiting for reply labels Apr 2, 2022

caishunfeng added kubernetes and removed Waiting for reply Waiting for reply labels Apr 6, 2022

caishunfeng mentioned this issue Apr 6, 2022

[DSIP-6][k8s] k8s on dolphinscheduler #9312

Closed

1 task

caishunfeng assigned huagetai Apr 6, 2022

EricGao888 added the have:design label Jul 26, 2022

EricGao888 mentioned this issue Oct 19, 2022

[RoadMap][Year 2022 Q4] Community RoadMap #12436

Closed

1 task

EricGao888 self-assigned this Nov 8, 2022

EricGao888 added the discussion discussion label Dec 6, 2022

EricGao888 added the priority:middle label Dec 6, 2022

EricGao888 removed their assignment Dec 7, 2022

EricGao888 mentioned this issue Jan 3, 2023

[Feature][Executor] Add K8S Executor for task dispatching #13316

Closed

3 tasks

EricGao888 self-assigned this Jan 3, 2023

SbloodyS closed this as completed Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature][master] When deployed on k8s and yarn, dolphinscheduler supports worker elastic scaling #9337

[Feature][master] When deployed on k8s and yarn, dolphinscheduler supports worker elastic scaling #9337

huagetai commented Apr 2, 2022

github-actions bot commented Apr 2, 2022

caishunfeng commented Apr 13, 2022

davidzollo commented Jun 12, 2022

davidzollo commented Jun 18, 2022

caishunfeng commented Jun 20, 2022

caishunfeng commented Jun 20, 2022

EricGao888 commented Dec 6, 2022

EricGao888 commented Dec 6, 2022

Radeity commented Dec 17, 2022

hdygxsj commented Dec 21, 2022

EricGao888 commented Dec 27, 2022

EricGao888 commented Jan 3, 2023 •

edited

Loading

EricGao888 commented Jan 3, 2023

leehom commented Feb 27, 2024

[Feature][master] When deployed on k8s and yarn, dolphinscheduler supports worker elastic scaling #9337

[Feature][master] When deployed on k8s and yarn, dolphinscheduler supports worker elastic scaling #9337

Comments

huagetai commented Apr 2, 2022

Search before asking

Description

Use case

Related issues

Are you willing to submit a PR?

Code of Conduct

github-actions bot commented Apr 2, 2022

caishunfeng commented Apr 13, 2022

davidzollo commented Jun 12, 2022

davidzollo commented Jun 18, 2022

caishunfeng commented Jun 20, 2022

caishunfeng commented Jun 20, 2022

EricGao888 commented Dec 6, 2022

EricGao888 commented Dec 6, 2022

Radeity commented Dec 17, 2022

hdygxsj commented Dec 21, 2022

EricGao888 commented Dec 27, 2022

EricGao888 commented Jan 3, 2023 • edited Loading

EricGao888 commented Jan 3, 2023

leehom commented Feb 27, 2024

EricGao888 commented Jan 3, 2023 •

edited

Loading