-
Notifications
You must be signed in to change notification settings - Fork 551
Description
Your current environment
nohup python -m vllm.entrypoints.openai.api_server --model=/mnt/deepseek/DeepSeek-R1-W8A8-VLLM
--trust-remote-code
--distributed-executor-backend=mp
-tp=16
-dp=1
--port 8006
--max-num-seqs 24
--max-model-len 32768
--max-num-batched-tokens 32768
--block-size 128
--enable-expert-parallel
--compilation_config 0
--gpu-memory-utilization 0.96
--additional-config '{"expert_tensor_parallel_size":16, "ascend_scheduler_config":{}}' &> run.log &
The principle of the problem:In the case of etp16 there is no parallel processing

🐛 Describe the bug
nohup python -m vllm.entrypoints.openai.api_server --model=/mnt/deepseek/DeepSeek-R1-W8A8-VLLM
--trust-remote-code
--distributed-executor-backend=mp
-tp=16
-dp=1
--port 8006
--max-num-seqs 24
--max-model-len 32768
--max-num-batched-tokens 32768
--block-size 128
--enable-expert-parallel
--compilation_config 0
--gpu-memory-utilization 0.96
--additional-config '{"expert_tensor_parallel_size":16, "ascend_scheduler_config":{}}' &> run.log &