feat: support data parallel for deepseek #1012

NeverRaR · 2025-05-29T10:22:02Z

What this PR does / why we need it?

feat: support data parallel for deepseek

Does this PR introduce any user-facing change?

Yes, support dp for deepseek

How was this patch tested?

export VLLM_ENABLE_MC2=0
export VLLM_USE_V1=1
export TASK_QUEUE_ENABLE=1

source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh

nohup python -m vllm.entrypoints.openai.api_server
--model=/path/to/DeepSeek-R1-W8A8 \
    --quantization ascend \
    --served-model-name auto \
    --trust-remote-code \
    --distributed-executor-backend=mp \
    --port 8006 \
    -tp=8 \
    -dp=2 \
    --max-num-seqs 24 \
    --max-model-len 4096 \
    --max-num-batched-tokens 4096 \
    --block-size 128 \
    -O 0 \
    --no-enable-prefix-caching \
--additional-config
'{"torchair_graph_batch_sizes":[24],"expert_tensor_parallel_size":16,"ascend_scheduler_config":{},"enable_graph_mode":true}'
\
    --gpu-memory-utilization 0.95 &> run.log &
disown

Yikun · 2025-06-03T04:07:14Z

Please do a rebase

Signed-off-by: boying <897013703@qq.com>

Yikun

@ganyi1996ppo Please do a final review

Yikun · 2025-06-04T06:45:34Z

BTW, as offline dicussion, we will refactor config in #1029

So, the first rc release (0.9.0rc1) to include this commit will have some user interface (config) changes such as enable_graph_mode, ascend_scheduler_config in separate PR: #1029

Yikun · 2025-06-04T10:31:37Z

I confirmed with @ganyi1996ppo offline, it's OK to merge.

### What this PR does / why we need it? feat: support data parallel for deepseek ### Does this PR introduce _any_ user-facing change? Yes, support dp for deepseek ### How was this patch tested? ``` export VLLM_ENABLE_MC2=0 export VLLM_USE_V1=1 export TASK_QUEUE_ENABLE=1 source /usr/local/Ascend/ascend-toolkit/set_env.sh source /usr/local/Ascend/nnal/atb/set_env.sh nohup python -m vllm.entrypoints.openai.api_server --model=/path/to/DeepSeek-R1-W8A8 \ --quantization ascend \ --served-model-name auto \ --trust-remote-code \ --distributed-executor-backend=mp \ --port 8006 \ -tp=8 \ -dp=2 \ --max-num-seqs 24 \ --max-model-len 4096 \ --max-num-batched-tokens 4096 \ --block-size 128 \ -O 0 \ --no-enable-prefix-caching \ --additional-config '{"torchair_graph_batch_sizes":[24],"expert_tensor_parallel_size":16,"ascend_scheduler_config":{},"enable_graph_mode":true}' \ --gpu-memory-utilization 0.95 &> run.log & disown ``` Signed-off-by: boying <897013703@qq.com>

### What this PR does / why we need it? feat: support data parallel for deepseek ### Does this PR introduce _any_ user-facing change? Yes, support dp for deepseek ### How was this patch tested? ``` export VLLM_ENABLE_MC2=0 export VLLM_USE_V1=1 export TASK_QUEUE_ENABLE=1 source /usr/local/Ascend/ascend-toolkit/set_env.sh source /usr/local/Ascend/nnal/atb/set_env.sh nohup python -m vllm.entrypoints.openai.api_server --model=/path/to/DeepSeek-R1-W8A8 \ --quantization ascend \ --served-model-name auto \ --trust-remote-code \ --distributed-executor-backend=mp \ --port 8006 \ -tp=8 \ -dp=2 \ --max-num-seqs 24 \ --max-model-len 4096 \ --max-num-batched-tokens 4096 \ --block-size 128 \ -O 0 \ --no-enable-prefix-caching \ --additional-config '{"torchair_graph_batch_sizes":[24],"expert_tensor_parallel_size":16,"ascend_scheduler_config":{},"enable_graph_mode":true}' \ --gpu-memory-utilization 0.95 &> run.log & disown ``` Signed-off-by: boying <897013703@qq.com> Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>

…1094) ### What this PR does / why we need it? Add `with_prefill_across_dp` to AscendMetadata to fix dp This pr fixes the bug introduced by #1012, which add an arg `with_prefill_across_dp` when dp_size > 1. Signed-off-by: MengqingCao <cmq0113@163.com>

### What this PR does / why we need it? feat: support data parallel for deepseek ### Does this PR introduce _any_ user-facing change? Yes, support dp for deepseek ### How was this patch tested? ``` export VLLM_ENABLE_MC2=0 export VLLM_USE_V1=1 export TASK_QUEUE_ENABLE=1 source /usr/local/Ascend/ascend-toolkit/set_env.sh source /usr/local/Ascend/nnal/atb/set_env.sh nohup python -m vllm.entrypoints.openai.api_server --model=/path/to/DeepSeek-R1-W8A8 \ --quantization ascend \ --served-model-name auto \ --trust-remote-code \ --distributed-executor-backend=mp \ --port 8006 \ -tp=8 \ -dp=2 \ --max-num-seqs 24 \ --max-model-len 4096 \ --max-num-batched-tokens 4096 \ --block-size 128 \ -O 0 \ --no-enable-prefix-caching \ --additional-config '{"torchair_graph_batch_sizes":[24],"expert_tensor_parallel_size":16,"ascend_scheduler_config":{},"enable_graph_mode":true}' \ --gpu-memory-utilization 0.95 &> run.log & disown ``` Signed-off-by: boying <897013703@qq.com>

…llm-project#1094) ### What this PR does / why we need it? Add `with_prefill_across_dp` to AscendMetadata to fix dp This pr fixes the bug introduced by vllm-project#1012, which add an arg `with_prefill_across_dp` when dp_size > 1. Signed-off-by: MengqingCao <cmq0113@163.com>

### What this PR does / why we need it? feat: support data parallel for deepseek ### Does this PR introduce _any_ user-facing change? Yes, support dp for deepseek ### How was this patch tested? ``` export VLLM_ENABLE_MC2=0 export VLLM_USE_V1=1 export TASK_QUEUE_ENABLE=1 source /usr/local/Ascend/ascend-toolkit/set_env.sh source /usr/local/Ascend/nnal/atb/set_env.sh nohup python -m vllm.entrypoints.openai.api_server --model=/path/to/DeepSeek-R1-W8A8 \ --quantization ascend \ --served-model-name auto \ --trust-remote-code \ --distributed-executor-backend=mp \ --port 8006 \ -tp=8 \ -dp=2 \ --max-num-seqs 24 \ --max-model-len 4096 \ --max-num-batched-tokens 4096 \ --block-size 128 \ -O 0 \ --no-enable-prefix-caching \ --additional-config '{"torchair_graph_batch_sizes":[24],"expert_tensor_parallel_size":16,"ascend_scheduler_config":{},"enable_graph_mode":true}' \ --gpu-memory-utilization 0.95 &> run.log & disown ``` Signed-off-by: boying <897013703@qq.com>

…llm-project#1094) ### What this PR does / why we need it? Add `with_prefill_across_dp` to AscendMetadata to fix dp This pr fixes the bug introduced by vllm-project#1012, which add an arg `with_prefill_across_dp` when dp_size > 1. Signed-off-by: MengqingCao <cmq0113@163.com>

NeverRaR changed the title ~~Dev/data parallel~~ feat: support data parallel for deepseek May 29, 2025

Yikun mentioned this pull request May 29, 2025

[bugfix] Add ep initialization check and change the return check to is_driver_worker #896

Merged

NeverRaR force-pushed the dev/data_parallel branch from c6b1878 to 114f072 Compare May 30, 2025 10:24

github-actions bot added module:tests module:ops module:core module:quantization labels May 30, 2025

NeverRaR force-pushed the dev/data_parallel branch from 114f072 to 76c143f Compare May 30, 2025 10:29

NeverRaR force-pushed the dev/data_parallel branch from 76c143f to 76b0b46 Compare June 3, 2025 06:04

github-actions bot removed module:tests module:core labels Jun 3, 2025

NeverRaR force-pushed the dev/data_parallel branch 2 times, most recently from 650fe8f to 7c57994 Compare June 3, 2025 12:07

github-actions bot added the module:core label Jun 3, 2025

NeverRaR force-pushed the dev/data_parallel branch 3 times, most recently from 7229ab6 to b22de07 Compare June 3, 2025 16:54

feat: support data parallel for deepseek

b22de07

Signed-off-by: boying <897013703@qq.com>

wangxiyuan mentioned this pull request Jun 4, 2025

[release] 0.9.0rc1 release checklist #904

Closed

76 tasks

jianzs added the ready read for review label Jun 4, 2025

wangxiyuan approved these changes Jun 4, 2025

View reviewed changes

Yikun approved these changes Jun 4, 2025

View reviewed changes

Yikun merged commit da9acfc into vllm-project:main Jun 4, 2025
25 checks passed

ttanzhiqiang mentioned this pull request Jun 5, 2025

[BugFix] fix ep=1 etp=16 #985

Closed

MengqingCao mentioned this pull request Jun 6, 2025

[Bugfix][DP] Add with_prefill_across_dp to AscendMetadata to fix dp #1094

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support data parallel for deepseek #1012

feat: support data parallel for deepseek #1012

Uh oh!

NeverRaR commented May 29, 2025 •

edited by Yikun

Loading

Uh oh!

Yikun commented Jun 3, 2025

Uh oh!

Yikun left a comment

Uh oh!

Yikun commented Jun 4, 2025 •

edited

Loading

Uh oh!

Yikun commented Jun 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: support data parallel for deepseek #1012

feat: support data parallel for deepseek #1012

Uh oh!

Conversation

NeverRaR commented May 29, 2025 • edited by Yikun Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Yikun commented Jun 3, 2025

Uh oh!

Yikun left a comment

Choose a reason for hiding this comment

Uh oh!

Yikun commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Yikun commented Jun 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

NeverRaR commented May 29, 2025 •

edited by Yikun

Loading

Yikun commented Jun 4, 2025 •

edited

Loading