Skip to content

feature: Need Multi-Node Multi-GPU to deploy one LLM with 671B DeepSeek using vllm and K8S #332

@TYsonHe

Description

@TYsonHe

Describe the feature

Need Multi-Node Multi-GPU to deploy one LLM with 671B DeepSeek using vllm and K8S
using nfs to share the parameters by pv and pvc
using vllm distributed inference and serving on Multi-Node Multi-GPU
expand the prometheus and grafana to support these

in fact it is hard for us to have a strong machine with 8* A100, we just have many machine but just one or two GPUs

Why do you need this feature?

I have experience with Kubernetes (K8S) and have gone through the tutorials. However, most of the available guides focus on deploying models on a single node, without covering multi-node deployment, inter-node communication, or best practices for scaling in Kubernetes.

Recently, we are working on deploying DeepSeek 671B using containers and vLLM. While I have successfully deployed a 7B LLM on a single node, my approach involved using Persistent Volumes (PV) and Persistent Volume Claims (PVC) with NFS to share model parameters. This method makes it easier to manage nodes and instances.

I noticed that vLLM supports distributed deployment on bare-metal machines, but I couldn’t find clear documentation on how to achieve this in a Kubernetes environment. If there are existing tutorials on this, they are not easy to find. It would be helpful to have guides on using NFS for distributed storage and managing multi-node deployments efficiently in Kubernetes.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions