Total number of attention heads (X) must be divisible by tensor parallel size (Y). #1041
Unanswered
tamastarjanyi
asked this question in
Q&A
Replies: 3 comments
-
Same problem with me: --tensor-parallel-size
|
Beta Was this translation helpful? Give feedback.
0 replies
-
same error,32 heads on 3 gpus |
Beta Was this translation helpful? Give feedback.
0 replies
-
same error, I use starcoder2 but tells me |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Trying to run falcon-7b on multiple nodes however getting the below error. Which is funny since 71 is a prime number. So I can run it on either 1 GPU (1NODE) or on 71 GPUs (NODES). Is there any way to avoid this problem?
My config is
RayHead is running on one node (actually container within kubernetes) via
ray start --head --dashboard-host 0.0.0.0 --num-gpus 1 --num-cpus 7
And RayWorker is working in another container via
ray start --disable-usage-stats --num-gpus 1 --num-cpus 7 --address <address>
ray status is fine
But when trying to run falcon-7b via
python -m vllm.entrypoints.api_server --model tiiuae/falcon-7b --trust-remote-code --tensor-parallel-size 2 --port 8080 --engine-use-ray --worker-use-ray
Below error is raised
Total number of attention heads (71) must be divisible by tensor parallel size (2).
Beta Was this translation helpful? Give feedback.
All reactions