[Core] Allow disabling TP sharding for parallel Linear layer #23024

Isotr0py · 2025-08-16T09:23:32Z

Purpose

Refer to #[Feature]: Generalized the DP feature for ViT and multimodal backbone for the benefit of all models #22743, we need to disable TP at ViT if enable DP for it.
Currently, to achieve this, we need to replaced all parallel linear layers to replicated linear, which will make model implementation too complicated if it needs weights packing logic. Furthermore, to fuse qkv_proj and gate_up_proj, we need to implement extra layers like MergedReplicatedParallelLinear etc, increasing maintenance burden.
This PR allows parallel linear layers falling back to replicated mode with disable_tp=True.
Will update if [FEAT] [Performance] Enable DP for ViT in Qwen2.5VL #22742 merged.

Test Plan

python examples/offline_inference/basic/generate.py --model RedHatAI/DeepSeek-V2.5-1210-FP8 -tp 4 --max-model-len 4096 --enforce-eager

Test Result

FP8 deepseek-v3 model can still work with fused_qkv_a_proj after reverting MergedReplicatedParallelLinear back to MergedColumnParallelLinear

(Optional) Documentation Update

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

github-actions · 2025-08-16T09:23:41Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request introduces a disable_tp flag to parallel linear layers, allowing them to fall back to a replicated mode. This is a valuable feature for models that need to conditionally disable tensor parallelism, for instance when data parallelism is enabled. The implementation across the various linear layer classes in vllm/model_executor/layers/linear.py is clean and consistent. The refactoring in other model files to leverage this new flag simplifies the code and demonstrates its utility. I have identified a critical typo and a potential prefixing issue in vllm/model_executor/models/step3_vl.py that should be addressed to ensure correctness, particularly for quantized models.

vllm/model_executor/models/step3_vl.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Isotr0py <2037008807@qq.com>

vllm/model_executor/layers/linear.py

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

DarkLight1337

cc @mgoin

mergify · 2025-08-19T15:56:08Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Isotr0py.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Isotr0py · 2025-08-26T14:57:22Z

@mgoin Can you please take a look to this PR? It could help us simplify the work to finish #22743. Thanks!

…llm-project#23024)" This reverts commit 53b19cc.

minosfuture · 2025-09-06T08:10:10Z

I think this PR broke TP on trunk. I see errors like

(VllmWorker TP5 pid=969158) ERROR 09-06 00:42:40 [multiproc_executor.py:588] Traceback (most recent call last):
(VllmWorker TP5 pid=969158) ERROR 09-06 00:42:40 [multiproc_executor.py:588]   File "/data/users/yming/gitrepos/vllm/vllm/v1/executor/multiproc_executor.py", line 562, in worker_main
(VllmWorker TP5 pid=969158) ERROR 09-06 00:42:40 [multiproc_executor.py:588]     worker = WorkerProc(*args, **kwargs)
(VllmWorker TP5 pid=969158) ERROR 09-06 00:42:40 [multiproc_executor.py:588]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP5 pid=969158) ERROR 09-06 00:42:40 [multiproc_executor.py:588]   File "/data/users/yming/gitrepos/vllm/vllm/v1/executor/multiproc_executor.py", line 431, in __init__
(VllmWorker TP5 pid=969158) ERROR 09-06 00:42:40 [multiproc_executor.py:588]     self.worker.load_model()
(VllmWorker TP5 pid=969158) ERROR 09-06 00:42:40 [multiproc_executor.py:588]   File "/data/users/yming/gitrepos/vllm/vllm/v1/worker/gpu_worker.py", line 213, in load_model
(VllmWorker TP5 pid=969158) ERROR 09-06 00:42:40 [multiproc_executor.py:588]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(VllmWorker TP5 pid=969158) ERROR 09-06 00:42:40 [multiproc_executor.py:588]   File "/data/users/yming/gitrepos/vllm/vllm/v1/worker/gpu_model_runner.py", line 2131, in load_model
(VllmWorker TP5 pid=969158) ERROR 09-06 00:42:40 [multiproc_executor.py:588]     self.model = model_loader.load_model(
(VllmWorker TP5 pid=969158) ERROR 09-06 00:42:40 [multiproc_executor.py:588]                  ^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP5 pid=969158) ERROR 09-06 00:42:40 [multiproc_executor.py:588]   File "/data/users/yming/gitrepos/vllm/vllm/model_executor/model_loader/base_loader.py", line 49, in load_model
(VllmWorker TP5 pid=969158) ERROR 09-06 00:42:40 [multiproc_executor.py:588]     self.load_weights(model, model_config)
(VllmWorker TP5 pid=969158) ERROR 09-06 00:42:40 [multiproc_executor.py:588]   File "/data/users/yming/gitrepos/vllm/vllm/model_executor/model_loader/default_loader.py", line 264, in load_weights
(VllmWorker TP5 pid=969158) ERROR 09-06 00:42:40 [multiproc_executor.py:588]     loaded_weights = model.load_weights(
(VllmWorker TP5 pid=969158) ERROR 09-06 00:42:40 [multiproc_executor.py:588]                      ^^^^^^^^^^^^^^^^^^^
(VllmWorker TP5 pid=969158) ERROR 09-06 00:42:40 [multiproc_executor.py:588]   File "/data/users/yming/gitrepos/vllm/vllm/model_executor/models/deepseek_v2.py", line 906, in load_weights
(VllmWorker TP5 pid=969158) ERROR 09-06 00:42:40 [multiproc_executor.py:588]     weight_loader(param, loaded_weight, shard_id)
(VllmWorker TP5 pid=969158) ERROR 09-06 00:42:40 [multiproc_executor.py:588]   File "/data/users/yming/gitrepos/vllm/vllm/model_executor/layers/linear.py", line 807, in weight_loader_v2
(VllmWorker TP5 pid=969158) ERROR 09-06 00:42:40 [multiproc_executor.py:588]     param.load_merged_column_weight(loaded_weight=loaded_weight,
(VllmWorker TP5 pid=969158) ERROR 09-06 00:42:40 [multiproc_executor.py:588]   File "/data/users/yming/gitrepos/vllm/vllm/model_executor/parameter.py", line 145, in load_merged_column_weight
(VllmWorker TP5 pid=969158) ERROR 09-06 00:42:40 [multiproc_executor.py:588]     loaded_weight = loaded_weight.narrow(self.output_dim,
(VllmWorker TP5 pid=969158) ERROR 09-06 00:42:40 [multiproc_executor.py:588]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker TP5 pid=969158) ERROR 09-06 00:42:40 [multiproc_executor.py:588] IndexError: start out of range (expected to be in range of [-576, 576], but got 2880)

reverting this fixed it. Lemme know if you need reproducing steps. I think kimi TP16 or deepseek DP2TP8EP would reproduce it.

Isotr0py · 2025-09-06T08:15:45Z

I think kimi TP16 or deepseek DP2TP8EP would reproduce it.

Ooops, can you provide the code for reproduction?

minosfuture · 2025-09-06T08:23:56Z

#node 0
vllm serve deepseek-ai/DeepSeek-V3-0324   --tensor-parallel-size 8   --enable-expert-parallel   --data-parallel-size 2   --data-parallel-size-local 1   --data-parallel-address $MASTER_IP  --data-parallel-rpc-port 13345

#node 1
vllm serve deepseek-ai/DeepSeek-V3-0324   --tensor-parallel-size 8   --enable-expert-parallel   --data-parallel-size 2   --data-parallel-size-local 1   --data-parallel-start-rank 1   --data-parallel-address  $MASTER_IP   --data-parallel-rpc-port 13345   --headless

I think single node test of a smaller model should also be able to reproduce it but I haven't tried.

Isotr0py · 2025-09-06T12:57:00Z

Hi @minosfuture, I think #24367 should fix this issue. Can you have a look? Thanks!

Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

…oject#23024) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

### What this PR does / why we need it? 1. Initial support disable tp for integrating with [vllm-commit](vllm-project/vllm#23024) 2. [vllm@commit](vllm-project/vllm#23673) now use `bytes` to save the `BlockHash` to reduce GC overhead, this pr add the integration - vLLM version: main - vLLM main: vllm-project/vllm@e408272 --------- Signed-off-by: wangli <wangli858794774@gmail.com>

…oject#23024) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

### What this PR does / why we need it? 1. Initial support disable tp for integrating with [vllm-commit](vllm-project/vllm#23024) 2. [vllm@commit](vllm-project/vllm#23673) now use `bytes` to save the `BlockHash` to reduce GC overhead, this pr add the integration - vLLM version: main - vLLM main: vllm-project/vllm@e408272 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: offline0806 <z00858301@china.huawei.com>

### What this PR does / why we need it? 1. Initial support disable tp for integrating with [vllm-commit](vllm-project/vllm#23024) 2. [vllm@commit](vllm-project/vllm#23673) now use `bytes` to save the `BlockHash` to reduce GC overhead, this pr add the integration - vLLM version: main - vLLM main: vllm-project/vllm@e408272 --------- Signed-off-by: wangli <wangli858794774@gmail.com>

…oject#23024) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

### What this PR does / why we need it? 1. Initial support disable tp for integrating with [vllm-commit](vllm-project/vllm#23024) 2. [vllm@commit](vllm-project/vllm#23673) now use `bytes` to save the `BlockHash` to reduce GC overhead, this pr add the integration - vLLM version: main - vLLM main: vllm-project/vllm@e408272 --------- Signed-off-by: wangli <wangli858794774@gmail.com>

…oject#23024) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Isotr0py added 3 commits August 16, 2025 16:41

allow disable tp for linear layer

b2208df

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

step-3

e49b2db

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

deepseek v2

21c1a67

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

mergify bot added the deepseek Related to DeepSeek models label Aug 16, 2025

gemini-code-assist bot reviewed Aug 16, 2025

View reviewed changes

vllm/model_executor/models/step3_vl.py Show resolved Hide resolved

vllm/model_executor/models/step3_vl.py Show resolved Hide resolved

Update vllm/model_executor/models/step3_vl.py

594e4f2

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Isotr0py <2037008807@qq.com>

huachenheli reviewed Aug 17, 2025

View reviewed changes

vllm/model_executor/layers/linear.py Show resolved Hide resolved

Isotr0py added 5 commits August 19, 2025 14:47

Merge branch 'main' into disable-linear-tp

c31a4a0

fix

6f28c74

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

remove unused MergedReplicatedLinear

a35e239

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

gemini

02d0f26

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Merge branch 'main' into disable-linear-tp

3abe26c

Isotr0py changed the title ~~[WIP] Allow disabling TP sharding for parallel Linear layer~~ [Core] Allow disabling TP sharding for parallel Linear layer Aug 19, 2025

Isotr0py marked this pull request as ready for review August 19, 2025 13:54

Isotr0py requested review from mgoin, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners August 19, 2025 13:54

DarkLight1337 reviewed Aug 19, 2025

View reviewed changes

mergify bot added the needs-rebase label Aug 19, 2025

Merge branch 'main' into disable-linear-tp

4549319

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

mergify bot removed the needs-rebase label Aug 19, 2025

update qwen2.5-vl

32108be

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Isotr0py requested a review from sighingnow as a code owner August 19, 2025 16:22

mergify bot added the qwen Related to Qwen models label Aug 19, 2025

Isotr0py mentioned this pull request Aug 22, 2025

[Model] Add Ovis2.5 PP support #23405

Merged

4 tasks

Merge branch 'main' into disable-linear-tp

793e05b

Merge branch 'main' into disable-linear-tp

85a73c2

Isotr0py requested a review from 22quinn as a code owner September 6, 2025 03:12

simon-mo merged commit 53b19cc into vllm-project:main Sep 6, 2025
40 of 43 checks passed

Isotr0py deleted the disable-linear-tp branch September 6, 2025 06:07

Isotr0py mentioned this pull request Sep 6, 2025

[VLM] Migrate remain DP-supported ViT models to use disable_tp #24363

Merged

8 tasks

minosfuture added a commit to minosfuture/vllm that referenced this pull request Sep 6, 2025

Revert "[Core] Allow disabling TP sharding for parallel Linear layer (v…

90e07ab

…llm-project#23024)" This reverts commit 53b19cc.

Isotr0py mentioned this pull request Sep 6, 2025

[Bugfix] Fix broken deepseek fp8 TP weights loading #24367

Merged

5 tasks

tlrmchlsmth added a commit to tlrmchlsmth/vllm that referenced this pull request Sep 7, 2025

handle change from vllm-project#23024

d92eae0

Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

Potabk mentioned this pull request Sep 9, 2025

[Bugfix] Fix broken CI vllm-project/vllm-ascend#2825

Merged

mickaelseznec mentioned this pull request Sep 15, 2025

DeepSeek fix: awq x mergedreplicatedlinear #23764

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Core] Allow disabling TP sharding for parallel Linear layer #23024

[Core] Allow disabling TP sharding for parallel Linear layer #23024

Uh oh!

Isotr0py commented Aug 16, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 left a comment

Uh oh!

mergify bot commented Aug 19, 2025

Uh oh!

Isotr0py commented Aug 26, 2025

Uh oh!

Uh oh!

minosfuture commented Sep 6, 2025

Uh oh!

Isotr0py commented Sep 6, 2025

Uh oh!

minosfuture commented Sep 6, 2025

Uh oh!

Isotr0py commented Sep 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

[Core] Allow disabling TP sharding for parallel Linear layer #23024

[Core] Allow disabling TP sharding for parallel Linear layer #23024

Uh oh!

Conversation

Isotr0py commented Aug 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Aug 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Aug 19, 2025

Uh oh!

Isotr0py commented Aug 26, 2025

Uh oh!

Uh oh!

minosfuture commented Sep 6, 2025

Uh oh!

Isotr0py commented Sep 6, 2025

Uh oh!

minosfuture commented Sep 6, 2025

Uh oh!

Isotr0py commented Sep 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Isotr0py commented Aug 16, 2025 •

edited by github-actions bot

Loading