[Qwen3Next] Fixes the cuda graph capture conditions under large batch sizes (#24660) #24667

sighingnow · 2025-09-11T13:43:09Z

No description provided.

… sizes. Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

gemini-code-assist

Code Review

This pull request attempts to fix a CUDA graph capture condition for large batch sizes in the GDNAttentionBackend. However, the added condition m.num_actual_tokens <= self.decode_cudagraph_max_bs introduces a unit mismatch by comparing a token count with a sequence limit. This makes the check overly restrictive and prevents CUDA graph usage in many valid scenarios. My review provides a detailed explanation of the issue and suggests a more accurate approach to fix the underlying problem.

gemini-code-assist · 2025-09-11T13:46:31Z

vllm/v1/attention/backends/gdn_attn.py

        if (self.use_full_cuda_graph and num_prefills == 0 and num_decodes == 0
-                and num_spec_decodes <= self.decode_cudagraph_max_bs):
+                and num_spec_decodes <= self.decode_cudagraph_max_bs
+                and m.num_actual_tokens <= self.decode_cudagraph_max_bs):


This condition m.num_actual_tokens <= self.decode_cudagraph_max_bs appears to have a unit mismatch. m.num_actual_tokens is the number of tokens, while self.decode_cudagraph_max_bs is used as a limit on the number of sequences for sizing tensors like spec_state_indices_tensor and spec_sequence_masks.

Comparing tokens to sequences is likely incorrect and makes this check overly restrictive. For instance, with num_spec=7 and decode_cudagraph_max_bs=32, this change limits num_spec_decodes to 4 (since 4 * 8 <= 32), whereas the original code allowed up to 32 sequences.

The underlying issue is that batch_size can exceed self.decode_cudagraph_max_bs due to token padding. The batch_size is calculated as self.vllm_config.pad_for_cudagraph(m.num_actual_tokens) // (self.num_spec + 1).

A more accurate check would be to compute this batch_size and compare it against self.decode_cudagraph_max_bs, while also ensuring m.num_actual_tokens does not exceed self.compilation_config.max_capture_size to prevent errors from pad_for_cudagraph.

I think keep m.num_actual_tokens <= self.decode_cudagraph_max_bs is good. Then and num_spec_decodes <= self.decode_cudagraph_max_bs seems unnecessary.

pUnK2008 · 2025-09-11T13:49:15Z

around 20 lines below there is one more similar place

LucasWilkinson

This makes sense to me assuming @vadiklyutiy confirms the fix

longer term hopefully: #23789 / #24002 will resolve this more broadly but this makes sense as a temporary fix 👍

LucasWilkinson · 2025-09-12T18:34:59Z

around 20 lines below there is one more similar place

Are you referring to:

        if (self.use_full_cuda_graph and num_prefills == 0
                and num_spec_decodes == 0
                and num_decodes <= self.decode_cudagraph_max_bs):

I think thats safe due to num_spec_decodes == 0 so in that case num_decodes == num_tokens

vadiklyutiy · 2025-09-12T18:40:58Z

around 20 lines below there is one more similar place

Are you referring to:
        if (self.use_full_cuda_graph and num_prefills == 0
                and num_spec_decodes == 0
                and num_decodes <= self.decode_cudagraph_max_bs):
I think thats safe due to num_spec_decodes == 0 so in that case num_decodes == num_tokens

agree

… sizes (vllm-project#24660) (vllm-project#24667) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

… sizes (#24660) (#24667) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

…-project#24660) (vllm-project#24667) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

… sizes (vllm-project#24660) (vllm-project#24667) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

… sizes (vllm-project#24660) (vllm-project#24667) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com> Signed-off-by: bbartels <benjamin@bartels.dev>

…ge batch sizes (vllm-project#24660) (vllm-project#24667)" This reverts commit 89da8d9.

…rge batch sizes (vllm-project#24660) (vllm-project#24667)" This reverts commit 02da9a5.

…under large batch sizes (vllm-project#24660) (vllm-project#24667)"" This reverts commit a1124c4.

… under large batch sizes (vllm-project#24660) (vllm-project#24667)"" This reverts commit 3a72536.

…-project#24660) (vllm-project#24667) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

… sizes (vllm-project#24660) (vllm-project#24667) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

… sizes (vllm-project#24660) (vllm-project#24667) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

sighingnow requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners September 11, 2025 13:43

[Qwen3Next] Fixes the cuda graph capture conditions under large batch…

fb73abd

… sizes. Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

mergify bot added the qwen Related to Qwen models label Sep 11, 2025

sighingnow force-pushed the fixes-large-bs-cg branch from 127ad66 to fb73abd Compare September 11, 2025 13:43

mergify bot added the v1 label Sep 11, 2025

sighingnow mentioned this pull request Sep 11, 2025

[BUG] [Qwen3-next] MPT+CG fail #24660

Closed

gemini-code-assist bot reviewed Sep 11, 2025

View reviewed changes

LucasWilkinson approved these changes Sep 12, 2025

View reviewed changes

LucasWilkinson enabled auto-merge (squash) September 12, 2025 19:32

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 12, 2025

Merge branch 'main' into fixes-large-bs-cg

bf711a3

LucasWilkinson merged commit 8226dd5 into vllm-project:main Sep 12, 2025
43 checks passed

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025

[Qwen3Next] Fixes the cuda graph capture conditions under large batch…

496d7fa

… sizes (vllm-project#24660) (vllm-project#24667) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

simon-mo pushed a commit that referenced this pull request Sep 13, 2025

[Qwen3Next] Fixes the cuda graph capture conditions under large batch…

89da8d9

… sizes (#24660) (#24667) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

shyeh25 pushed a commit to shyeh25/vllm that referenced this pull request Sep 15, 2025

Fixes the cuda graph capture conditions under large batch sizes (vllm…

a33f74a

…-project#24660) (vllm-project#24667) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

sighingnow deleted the fixes-large-bs-cg branch September 15, 2025 06:08

dsxsteven pushed a commit to dsxsteven/vllm_splitPR that referenced this pull request Sep 15, 2025

[Qwen3Next] Fixes the cuda graph capture conditions under large batch…

fb2ffcc

… sizes (vllm-project#24660) (vllm-project#24667) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Sep 17, 2025

Revert "[Qwen3Next] Fixes the cuda graph capture conditions under lar…

02da9a5

…ge batch sizes (vllm-project#24660) (vllm-project#24667)" This reverts commit 89da8d9.

Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Sep 17, 2025

Reapply "[Qwen3Next] Fixes the cuda graph capture conditions under la…

a1124c4

…rge batch sizes (vllm-project#24660) (vllm-project#24667)" This reverts commit 02da9a5.

Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Sep 17, 2025

Revert "Reapply "[Qwen3Next] Fixes the cuda graph capture conditions …

3a72536

…under large batch sizes (vllm-project#24660) (vllm-project#24667)"" This reverts commit a1124c4.

Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Sep 17, 2025

Reapply "Reapply "[Qwen3Next] Fixes the cuda graph capture conditions…

b2c65f6

… under large batch sizes (vllm-project#24660) (vllm-project#24667)"" This reverts commit 3a72536.

shyeh25 pushed a commit to shyeh25/vllm that referenced this pull request Sep 23, 2025

Fixes the cuda graph capture conditions under large batch sizes (vllm…

769bdf9

…-project#24660) (vllm-project#24667) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Qwen3Next] Fixes the cuda graph capture conditions under large batch…

c19dcee

… sizes (vllm-project#24660) (vllm-project#24667) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Qwen3Next] Fixes the cuda graph capture conditions under large batch sizes (#24660) #24667

[Qwen3Next] Fixes the cuda graph capture conditions under large batch sizes (#24660) #24667

Uh oh!

sighingnow commented Sep 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 11, 2025

Uh oh!

fhl2000 Sep 11, 2025

Uh oh!

pUnK2008 commented Sep 11, 2025

Uh oh!

LucasWilkinson left a comment

Uh oh!

LucasWilkinson commented Sep 12, 2025 •

edited

Loading

Uh oh!

vadiklyutiy commented Sep 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

[Qwen3Next] Fixes the cuda graph capture conditions under large batch sizes (#24660) #24667

[Qwen3Next] Fixes the cuda graph capture conditions under large batch sizes (#24660) #24667

Uh oh!

Conversation

sighingnow commented Sep 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

fhl2000 Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

pUnK2008 commented Sep 11, 2025

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vadiklyutiy commented Sep 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

LucasWilkinson commented Sep 12, 2025 •

edited

Loading