You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The development and application of large language models are experiencing explosive growth, with open-source models like DeepSeek-R1 continuously emerging, driving the demand for developers to deploy large models in local environments. However, as the scale of model parameters continues to grow, the memory capacity of a single device has become insufficient to accommodate the complete model. Some inference frameworks have begun actively exploring multi-node distributed inference solutions:
Even KServe has modified their serving API, add a new field called WorkerSpec to implement multi-node distributed inference
After discussing with @Monokaix@hwdef , we'd better implement LeaderWorkerSet first and get end users' feedback.
Describe the solution you'd like
LeaderWorkerSet has the concept of logical PodGroup when it is designed, corresponding to 1 Leader + n Workers. Volcano needs to keep this logical PodGroup concept consistent with Volcano's PodGroup. The replicas in LeaderWorkerSet represent the number of Volcano PodGroups to be created. One of the tasks is Leader Pod, the replica is 1, and the other task is Workers. So there are following tasks need to be adapted:
Implement network topology aware scheduling for worker pods
In the future, if users would like volcano to design a new native API like vcserve to serve online services like multi-node inference, we may also design a new native API, but for now it is OK to follow up on lws.
This is very useful.
We need to adapt to lws, and I think volcano also needs to implement a serve API in the future, such as vcserve
I'm not sure if the following is needed, because we can implement the requirements through statefulset to avoid introducing too many third-party packages
Add a LeaderWorkSet controller, reconcile to create podgroups for lws
This is very useful. We need to adapt to lws, and I think volcano also needs to implement a serve API in the future, such as vcserve
I'm not sure if the following is needed, because we can implement the requirements through statefulset to avoid introducing too many third-party packages
Add a LeaderWorkSet controller, reconcile to create podgroups for lws
I think it's okay. lws is also an api pushed by the k8s sig, things like KServe that are really third-party. If we don't add the lws controller, we'll need to add the special judgments to the PodGroup controller, which isn't very readable, and I'm not sure what kind of problems that would cause, and there's also the RestartPolicy field in lws we needed.
What is the problem you're trying to solve
BackGround
The development and application of large language models are experiencing explosive growth, with open-source models like DeepSeek-R1 continuously emerging, driving the demand for developers to deploy large models in local environments. However, as the scale of model parameters continues to grow, the memory capacity of a single device has become insufficient to accommodate the complete model. Some inference frameworks have begun actively exploring multi-node distributed inference solutions:
New API for multi-node distributed inference
LeaderWorkerSet
k8s sig has designed a new API for multi-node distributed inference scenario, called LeaderWorkerSet:
https://github.com/kubernetes-sigs/lws
KServe ServingRuntime/ClusterServingRuntime WorkerSpec
Even KServe has modified their serving API, add a new field called WorkerSpec to implement multi-node distributed inference
After discussing with @Monokaix @hwdef , we'd better implement LeaderWorkerSet first and get end users' feedback.
Describe the solution you'd like
LeaderWorkerSet has the concept of logical PodGroup when it is designed, corresponding to 1 Leader + n Workers. Volcano needs to keep this logical PodGroup concept consistent with Volcano's PodGroup. The replicas in LeaderWorkerSet represent the number of Volcano PodGroups to be created. One of the tasks is Leader Pod, the replica is 1, and the other task is Workers. So there are following tasks need to be adapted:
In the future, if users would like volcano to design a new native API like
vcserve
to serve online services like multi-node inference, we may also design a new native API, but for now it is OK to follow up on lws.Additional context
Multi-node distributed inference research wrote by myself: https://docs.google.com/document/d/19Z0-hCdjKiL8AGA59NjZ-tj-ijDX8cFCKItZ2QqqJpY/edit?usp=sharing
The text was updated successfully, but these errors were encountered: