-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Job queueing functionality with Ray Serve + Workflows #21161
Comments
thx a lot for putting together this proposal @ericl . One topic we should drill into a bit more is the persistence layer, and what to persist. E.g. storing execution metadata and potentially other data can become stressful for the underlying persistence technology. What's your thinking around this? |
This document also could be useful for feature comparison? https://docs.google.com/spreadsheets/d/1Q_v__QtYzDtIDOkfmeQqVAw_jiksBz-GbJn-m-pDM34/edit#gid=0 |
Currently, workflows supports filesystem and S3-based persistence. This is relatively high latency (~100ms+ per operation), but very scalable. Once we add an index, this should also scale nicely for very large queues of jobs. |
Also cc @pcmoritz |
Would this support job prioritization or dynamic resources for each job? For example, I have a long-running training job and want it to use all the resources in my cluster. But if a higher priority job comes in, I want to move some of my resources from the training job to the new high priority job. Assume that the training job can natively handle scaling down/up without having to restart training. |
Great question. In the RFC we only have support for rate limiting. Supporting prioritized jobs / multiple queues of different types of jobs would be the next obvious extension [P1]. Though we may need to add features to Ray to support prioritization of existing running jobs. |
would be interesting to have this guided by use cases. in a certain way, prioritization of work is/was driven by hard constraints in terms of capacity. when running on a cloud, there is still a quota set for an account, but in general there is much more freedom wrt allocating new capacity as needed, and also take alternative approaches, which would be interesting to think through. E.g. instead of having to make prioritization trade-offs within Ray, there is also the option of just spinning up another ray cluster, for workloads of a different priority. with that it could also be ensured (even more strongly) that low and high-priority work don't step on each other, but would avoid the need for implementing complex cluster-internal prioritization logic. WDYT? |
That's a great point. Indeed taking a step back prioritization can be
solved at a higher layer for most use cases, so there isn't a strong need
to support that within a cluster beyond simple rate limiting.
…On Wed, Dec 22, 2021, 4:36 AM mbehrendt ***@***.***> wrote:
Would this support job prioritization or dynamic resources for each job?
Great question. In the RFC we only have support for rate limiting.
Supporting prioritized jobs / multiple queues of different types of jobs
would be the next obvious extension [P1]. Though we may need to add
features to Ray to support prioritization of existing running jobs.
would be interesting to have this guided by use cases. in a certain way,
prioritization of work is/was driven by hard constraints in terms of
capacity. when running on a cloud, there is still a quota set for an
account, but in general there is much more freedom wrt allocating new
capacity as needed, and also take alternative approaches, which would be
interesting to think through. E.g. instead of having to make prioritization
trade-offs within Ray, there is also the option of just spinning up another
ray cluster, for workloads of a different priority. with that it could also
be ensured (even more strongly) that low and high-priority work don't step
on each other, but would avoid the need for implementing complex
cluster-internal prioritization logic.
WDYT?
—
Reply to this email directly, view it on GitHub
<#21161 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAADUSSYBC5QLYG44OQA6DTUSHA5FANCNFSM5KJ2MNXA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Rate limiting for workflows would be really helpful, currently struggling a bit to setup a workflow where we have to call an API at a certain rate per second. |
Interesting nuggets on Ray for job processing from this blog: https://vishnudeva.medium.com/scaling-applications-on-kubernetes-with-ray-23692eb2e6f0
This echoes other use cases for Ray as a unified & high-performance processing backend. |
re: job prioritization Agreed about spinning up separate clusters for online and offline use-cases for most scenarios. However, also consider that some use-cases like #2618 (comment), mention sharing the dataplane for doing offline compute on intermediate outputs from the online pipeline while it is running, without needing to build an intermediate transfer/persistence layer. Conversely, some users want to run low-latency observability tasks against a primarily offline compute workload: #2618 (comment) Also note the other use-case, which is to have bursty workloads with low-latency requirements running on the same cluster as offline compute to save costs on the former were it to need standalone cluster (multi-tenancy). #16782 (comment) Others:
To my mind, reliance on live usage statistics instead of placement groups could also help with better packing. Referencing how Lambda does its scheduling could perhaps help improve the autoscaler on this front.
Note that priority itself makes no difference if the low-latency online task is submitted when all resources are (logically or physically) fully-consumed by long-running offline tasks. Not sure how valid the comments are about spinning up more nodes since user may want tighter-packing and faster response times for their low-latency workloads than autoscaler can provide (furthermore, though it can be tuned, it may be difficult to tune the upscaling speed well without having permanent upscaling slack and thus lower resource efficiency). This is especially true if the online tasks can require massive scale (think a multi-node language or multi-modal model), which makes the packing problem harder without preemption. In light of this, warm-engine worker pool + premptible task prioritization might be a solution that solves tighter resource packing for bursty and unpredictable online workloads and/or same-dataplane offline+online workloads. Non-preemptible task prioritization may also solve the issue but only if offline tasks are not long-running, i.e. are <100ms (but that is unlikely to my mind). Some parameters worth considering:
|
Recommend this very informative and relevant talk on preemption and packing of jobs for Nomad, a low-latency, lightweight cluster orchestration tool based on distributed scheduling via Sparrow, made by Hashicorp. Key takeaways:
Additional thoughts:
|
Another use case for priority scheduling: prioritizing bottlenecked tasks (e.g. in an execution DAG): apache/datafusion#1221 (comment) |
Currently, docs state:
Will the requirements of persistence be tracked under this RFC? Is handling head node failure part of this RFC? Take for instance the scenario where both ray-operator and the head pod are scheduled on a spot instance, that spot instance gets unexpectedly killed, and these pods are restarted on another node (I guess this scenario is also relevant outside the Serve/Workflows context).
|
你好,你的邮件已收到,谢谢!
|
Hi, I'm a bot from the Ray team :) To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months. If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel. |
I believe this proposal is being superseded by #32292. Please comment there to voice your support + vote on which option is better for your use case! We are looking for more signals. |
Hi, I'm a bot from the Ray team :) To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months. If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel. |
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message. Please feel free to reopen or open a new issue if you'd still like it to be addressed. Again, you can always ask for help on our discussion forum or Ray's public slack channel. Thanks again for opening the issue! |
Overview
A number of users have raised the desire for a native Job queue within Ray, which can provide:
This aligns well with the combination of Serve + the Workflows project (https://docs.ray.io/en/latest/workflows/concepts.html), with a few enhancements. In particular, Workflows needs to:
workflow.init(max_concurrent_jobs=100)
.Reference Architecture
The following proposes a reference architecture for supporting Job queueing in Ray, with support for querying job status and integration with serve & event sources.
At a high level, it is composed of a few parts:
Discussion
Fault tolerance:
workflow.resume_all()
on startup to resume interrupted jobs.Scalability:
Status
Currently, this project is in discussion and we want to gather further feedback from the community. Should we proceed, most of the work involves adding operational features to workflows and examples:
P0:
P1:
cc @simon-mo @iycheng @edoakes @anabranch @yiranwang52
The text was updated successfully, but these errors were encountered: