Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serving Llama3-405B on 2 GPU node with 8H100 following the example in the repository #184

Closed
liurupeng opened this issue Jul 31, 2024 · 1 comment · Fixed by #185
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@liurupeng
Copy link
Collaborator

What happened:
I followed the example in vllm+ray+lws, but somehow meet this bug when using 2 GPU nodes with 8H100 GPUs. here is my deployment:

apiVersion: leaderworkerset.x-k8s.io/v1
kind: LeaderWorkerSet
metadata:
  name: vllm
spec:
  replicas: 1
  leaderWorkerTemplate:
    size: 2
    restartPolicy: RecreateGroupOnPodRestart
    leaderTemplate:
      metadata:
        labels:
          role: leader
      spec:
        containers:
          - name: vllm-leader
            # this image is build with the Dockerfile under ./build
            image: us-central1-docker.pkg.dev/gke-aishared-dev/vllm-ray-multihost/ray-vllm:v0.5.3.post1
            env:
              - name: RAY_CLUSTER_SIZE
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.annotations['leaderworkerset.sigs.k8s.io/size']
              - name: HUGGING_FACE_HUB_TOKEN
                value: ""
            command:
              - sh
              - -c
              - "/vllm-workspace/ray_init.sh leader --ray_cluster_size=$RAY_CLUSTER_SIZE; 
                 python3 -m vllm.entrypoints.openai.api_server --port 8080 --model meta-llama/Meta-Llama-3.1-405B-Instruct --tensor-parallel-size 8 --pipeline-parallel-size 2"
            resources:
              limits:
                nvidia.com/gpu: "8"
              requests:
                nvidia.com/gpu: "8"
            ports:
              - containerPort: 8080
    workerTemplate:
      spec:
        containers:
          - name: vllm-worker
            # this image is build with the Dockerfile under ./build
            image: us-central1-docker.pkg.dev/gke-aishared-dev/vllm-ray-multihost/ray-vllm:v0.5.3.post1
            command:
              - sh
              - -c
              - "/vllm-workspace/ray_init.sh worker --ray_address=$(LEADER_NAME).$(LWS_NAME).$(NAMESPACE).svc.cluster.local"
            resources:
              limits:
                nvidia.com/gpu: "8"
              requests:
                nvidia.com/gpu: "8"
            env:
              - name: LEADER_NAME
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.annotations['leaderworkerset.sigs.k8s.io/leader-name']
              - name: NAMESPACE
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.namespace
              - name: LWS_NAME
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.labels['leaderworkerset.sigs.k8s.io/name']
              - name: HUGGING_FACE_HUB_TOKEN
                value: ""

But I saw errors "ValueError: Ray does not allocate any GPUs on the driver node. Consider adjusting the Ray placement group or running the driver on a GPU node."

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • LWS version (use git describe --tags --dirty --always):
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@liurupeng liurupeng added the kind/bug Categorizes issue or PR as related to a bug. label Jul 31, 2024
@liurupeng
Copy link
Collaborator Author

seems related to vllm-project/vllm#2406. @gujingit do you know is there a workaround for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant