-
Notifications
You must be signed in to change notification settings - Fork 315
Description
Describe the feature
Need Multi-Node Multi-GPU to deploy one LLM with 671B DeepSeek using vllm and K8S
using nfs to share the parameters by pv and pvc
using vllm distributed inference and serving on Multi-Node Multi-GPU
expand the prometheus and grafana to support these
in fact it is hard for us to have a strong machine with 8* A100, we just have many machine but just one or two GPUs
Why do you need this feature?
I have experience with Kubernetes (K8S) and have gone through the tutorials. However, most of the available guides focus on deploying models on a single node, without covering multi-node deployment, inter-node communication, or best practices for scaling in Kubernetes.
Recently, we are working on deploying DeepSeek 671B using containers and vLLM. While I have successfully deployed a 7B LLM on a single node, my approach involved using Persistent Volumes (PV) and Persistent Volume Claims (PVC) with NFS to share model parameters. This method makes it easier to manage nodes and instances.
I noticed that vLLM supports distributed deployment on bare-metal machines, but I couldn’t find clear documentation on how to achieve this in a Kubernetes environment. If there are existing tutorials on this, they are not easy to find. It would be helpful to have guides on using NFS for distributed storage and managing multi-node deployments efficiently in Kubernetes.
Additional context
No response