[deepseek_r1] General PP enabling #1240

tvoas · 2025-05-12T05:04:54Z

Enable PP solution with full support for DeepSeek R1 execution with PP>1.
Requires 1.21.0 or newer. Does not support 1.20.1 or older.
Implementation mirrors Implement Pipeline Parallelism support for HPU. #1000 as closely as possible while ensuring DeepSeek R1 functions fully.
Adds a benchmark script for sweeping various configs automatically. This can be removed if you feel it shouldnt merge to deepseek_r1 branch.

Additional validation is being done by yabai.hu@intel.com.

@czhu15 youlei.yang@intel.com please help start the review in the meantime.

vllm/model_executor/models/deepseek_v3.py

scripts/benchmark_client_param.sh

scripts/benchmark_server_param.sh

czhu15 · 2025-05-12T06:49:04Z

scripts/benchmark_server_param.sh

+
+if [ "$KV_CACHE_DTYPE" = "fp8_inc" ]; then
+  export VLLM_USE_FP8_MATMUL="true"
+  export VLLM_USE_SINGLE_TENSOR_CACHE="1"


why we need set "VLLM_USE_FP8_MATMUL" by application? shouldn't INC handle it?
And what's VLLM_USE_SINGLE_TENSOR_CACHE for? Can't find it in the vllm code...

Good cache. This was a leftover from Liu, Yi's FP8 matmul implementation on the PRC local version of this branch. I see that the implementation that is provided on deepseek_r1 branch is different, though and no longer uses VLLM_USE_SINGLE_TENSOR_CACHE.

We still need VLLM_USE_FP8_MATMUL though, right? This environment variable should be set for improved performance according to @yiliu30.

Yes, the VLLM_USE_SINGLE_TENSOR_CACHE flag should be removed, as we have been using one tensor for KVCache since #977.

We still need VLLM_USE_FP8_MATMUL for FP8 Q@K and FP8 A@V, since the current path does not use the INC to replace the Matmul with PatchedMatmul. Instead, it manually patches Q@K and A@V and use 1.0 as the scaling. Please refer to #977 for more details.

cc @xuechendi

scripts/sweep_benchmarks.sh

vllm/model_executor/models/deepseek_v2.py

vllm/model_executor/models/utils.py

vllm/worker/hpu_worker.py

Co-authored-by: Hu, Yabai <yabai.hu@intel.com> Co-authored-by: Ji, Kunshang <kunshang.ji@intel.com> Co-authored-by: Sheng, Yi <yi.sheng@intel.com> Co-authored-by: Chen, Xinyu <xinyu1.chen@intel.com> Co-authored-by: Voas, Tanner <tanner.voas@intel.com> Signed-off-by: Voas, Tanner <tanner.voas@intel.com>

tvoas requested review from afierka-intel, kzawora-intel, mgawarkiewicz, michalkuligowski and vivekgoe as code owners May 12, 2025 05:04

xinyu-intel reviewed May 12, 2025

View reviewed changes

vllm/model_executor/models/deepseek_v3.py Outdated Show resolved Hide resolved

czhu15 reviewed May 12, 2025

View reviewed changes

tvoas force-pushed the enable_pp_g2d_global branch 2 times, most recently from 7cd244f to 2f9b8b1 Compare May 14, 2025 01:54

tvoas requested review from czhu15 and xinyu-intel May 14, 2025 01:56

tvoas force-pushed the enable_pp_g2d_global branch 3 times, most recently from d8ef146 to 05e298d Compare May 15, 2025 05:46

tvoas force-pushed the enable_pp_g2d_global branch from 05e298d to ba9bf97 Compare May 15, 2025 07:57

czhu15 merged commit 6767058 into HabanaAI:deepseek_r1 May 15, 2025
1 check failed

tvoas deleted the enable_pp_g2d_global branch June 4, 2025 00:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[deepseek_r1] General PP enabling #1240

[deepseek_r1] General PP enabling #1240

Uh oh!

tvoas commented May 12, 2025 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

czhu15 May 12, 2025

Uh oh!

tvoas May 14, 2025

Uh oh!

yiliu30 May 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[deepseek_r1] General PP enabling #1240

[deepseek_r1] General PP enabling #1240

Uh oh!

Conversation

tvoas commented May 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

czhu15 May 12, 2025

Choose a reason for hiding this comment

Uh oh!

tvoas May 14, 2025

Choose a reason for hiding this comment

Uh oh!

yiliu30 May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tvoas commented May 12, 2025 •

edited by github-actions bot

Loading

yiliu30 May 14, 2025 •

edited

Loading