[Model] Enable DP for ViT in Qwen2-VL #25445

DarkLight1337 · 2025-09-23T03:05:05Z

Purpose

Test Plan

Test Result

$ vllm bench serve \
    --backend openai-chat \
    --model Qwen/Qwen2-VL-7B-Instruct \
    --endpoint /v1/chat/completions \
    --dataset-name hf \
    --dataset-path lmarena-ai/VisionArena-Chat \
    --hf-split train \
    --num-prompts 500

$ vllm serve Qwen/Qwen2-VL-7B-Instruct -tp 2 --mm-encoder-tp-mode weights --limit_mm_per_prompt.image=1
============ Serving Benchmark Result ============
Successful requests:                     500       
Benchmark duration (s):                  159.99    
Total input tokens:                      34073     
Total generated tokens:                  49892     
Request throughput (req/s):              3.13      
Output token throughput (tok/s):         311.84    
Peak output token throughput (tok/s):    2314.00   
Peak concurrent requests:                500.00    
Total Token throughput (tok/s):          524.80    
---------------Time to First Token----------------
Mean TTFT (ms):                          62900.15  
Median TTFT (ms):                        52084.25  
P99 TTFT (ms):                           152029.84 
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          606.95    
Median TPOT (ms):                        711.70    
P99 TPOT (ms):                           791.46    
---------------Inter-token Latency----------------
Mean ITL (ms):                           584.34    
Median ITL (ms):                         716.03    
P99 ITL (ms):                            1033.28   
==================================================

$ vllm serve Qwen/Qwen2-VL-7B-Instruct -tp 2 --mm-encoder-tp-mode data --limit_mm_per_prompt.image=1
============ Serving Benchmark Result ============
Successful requests:                     500       
Benchmark duration (s):                  128.82    
Total input tokens:                      34073     
Total generated tokens:                  49656     
Request throughput (req/s):              3.88      
Output token throughput (tok/s):         385.46    
Peak output token throughput (tok/s):    2297.00   
Peak concurrent requests:                500.00    
Total Token throughput (tok/s):          649.95    
---------------Time to First Token----------------
Mean TTFT (ms):                          50187.68  
Median TTFT (ms):                        40427.38  
P99 TTFT (ms):                           120921.60 
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          488.14    
Median TPOT (ms):                        562.51    
P99 TPOT (ms):                           630.13    
---------------Inter-token Latency----------------
Mean ITL (ms):                           475.18    
Median ITL (ms):                         578.27    
P99 ITL (ms):                            728.98    
==================================================

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

gemini-code-assist

Code Review

This pull request introduces data parallelism support for the Vision Transformer in Qwen2-VL. The changes are well-structured and primarily involve plumbing a use_data_parallel flag through the vision model components to conditionally disable tensor parallelism. The logic for handling data-parallel execution paths appears correct. Overall, the changes are sound and should enable the intended data parallelism functionality.

Isotr0py

LGTM

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: charlifu <charlifu@amd.com>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: yewentao256 <zhyanwentao@126.com>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: gaojc <1055866782@qq.com>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

gongshanchong · 2025-10-17T02:40:07Z

I tested the DP strategy in two different hardware environments, using nccl-tests to test GPU communication in both environments. The results were (Avg bus bandwidth: 23.8703) for the L20 hardware environment and (Avg bus bandwidth: 282.133) for the H800 hardware environment. In the Qwen2.5-VL-7B-Instruct model I deployed, the TP strategy outperformed the DP strategy when the concurrency was below approximately 256. Why is this? Is it because communication is no longer a bottleneck at this point, and the computational benefits of TP outweigh the communication benefits of DP? When the concurrency reaches a certain level, communication in the Vit module becomes a bottleneck, and the communication benefits of DP outweigh the computational benefits of TP.

[Model] Enable DP for ViT in Qwen2-VL

7705995

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 requested review from Isotr0py and ywang96 September 23, 2025 03:05

DarkLight1337 requested a review from sighingnow as a code owner September 23, 2025 03:05

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 23, 2025

DarkLight1337 mentioned this pull request Sep 23, 2025

[Feature]: Generalized the DP feature for ViT and multimodal backbone for the benefit of all models #22743

Closed

1 task

mergify bot added the qwen Related to Qwen models label Sep 23, 2025

gemini-code-assist bot reviewed Sep 23, 2025

View reviewed changes

Isotr0py approved these changes Sep 23, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) September 23, 2025 04:51

DarkLight1337 merged commit c98be0a into vllm-project:main Sep 23, 2025
52 of 53 checks passed

DarkLight1337 deleted the vit-dp-qwen2-vl branch September 23, 2025 05:17

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Model] Enable DP for ViT in Qwen2-VL (vllm-project#25445)

77bf215

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025

[Model] Enable DP for ViT in Qwen2-VL (vllm-project#25445)

befa4d0

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: charlifu <charlifu@amd.com>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

[Model] Enable DP for ViT in Qwen2-VL (#25445)

0a1397c

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: yewentao256 <zhyanwentao@126.com>

DarkLight1337 mentioned this pull request Oct 4, 2025

Fix tensor device and dtype placement in Qwen2VL model #26219

Merged

gjc0824 pushed a commit to gjc0824/vllm that referenced this pull request Oct 10, 2025

[Model] Enable DP for ViT in Qwen2-VL (vllm-project#25445)

3a86340

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: gaojc <1055866782@qq.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[Model] Enable DP for ViT in Qwen2-VL (vllm-project#25445)

94675f3

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025

[Model] Enable DP for ViT in Qwen2-VL (vllm-project#25445)

407839b

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Model] Enable DP for ViT in Qwen2-VL #25445

[Model] Enable DP for ViT in Qwen2-VL #25445

Uh oh!

DarkLight1337 commented Sep 23, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Isotr0py left a comment

Uh oh!

Uh oh!

gongshanchong commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Model] Enable DP for ViT in Qwen2-VL #25445

[Model] Enable DP for ViT in Qwen2-VL #25445

Uh oh!

Conversation

DarkLight1337 commented Sep 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gongshanchong commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DarkLight1337 commented Sep 23, 2025 •

edited by github-actions bot

Loading