Skip to content

[Bug]: When moe ep=16 etp=1, the result is normal. When moe ep=1 etp=16, the result is abnormal. #971

@ttanzhiqiang

Description

@ttanzhiqiang

Your current environment

nohup python -m vllm.entrypoints.openai.api_server --model=/mnt/deepseek/DeepSeek-R1-W8A8-VLLM \ --trust-remote-code \ --distributed-executor-backend=mp \ -tp=16 \ -dp=1 \ --port 8006 \ --max-num-seqs 24 \ --max-model-len 32768 \ --max-num-batched-tokens 32768 \ --block-size 128 \ --enable-expert-parallel \ --compilation_config 0 \ --gpu-memory-utilization 0.96 \ --additional-config '{"expert_tensor_parallel_size":1, "ascend_scheduler_config":{}}' &> run.log & Image

nohup python -m vllm.entrypoints.openai.api_server --model=/mnt/deepseek/DeepSeek-R1-W8A8-VLLM
--trust-remote-code
--distributed-executor-backend=mp
-tp=16
-dp=1
--port 8006
--max-num-seqs 24
--max-model-len 32768
--max-num-batched-tokens 32768
--block-size 128
--enable-expert-parallel
--compilation_config 0
--gpu-memory-utilization 0.96
--additional-config '{"expert_tensor_parallel_size":16, "ascend_scheduler_config":{}}' &> run.log &

Image

The principle of the problem:In the case of etp16 there is no parallel processing
Image

🐛 Describe the bug

nohup python -m vllm.entrypoints.openai.api_server --model=/mnt/deepseek/DeepSeek-R1-W8A8-VLLM
--trust-remote-code
--distributed-executor-backend=mp
-tp=16
-dp=1
--port 8006
--max-num-seqs 24
--max-model-len 32768
--max-num-batched-tokens 32768
--block-size 128
--enable-expert-parallel
--compilation_config 0
--gpu-memory-utilization 0.96
--additional-config '{"expert_tensor_parallel_size":16, "ascend_scheduler_config":{}}' &> run.log &

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions