[Perf] Increase default max splits for FA3 full cudagraphs #25495

LucasWilkinson · 2025-09-23T17:25:55Z

#25274 provides evidence that 32 would be a much better default

due to full-CG potential becoming default #25444 seems like a good time to improve this

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

gemini-code-assist

Code Review

This pull request increases the default value for VLLM_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH from 16 to 64, aiming to improve performance for FA3 full cudagraphs. The change is applied consistently in both the type-checking block and the runtime environment variable definition. While the change itself is straightforward, I've identified a maintainability issue with the duplicated default value. I've left a comment suggesting to use a constant to avoid potential inconsistencies in the future.

gemini-code-assist · 2025-09-23T17:26:51Z

vllm/envs.py

    lambda: int(os.getenv("VLLM_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH",
-                          "16")),
+                          "64")),


The default value '64' is hardcoded here and also in the TYPE_CHECKING block at line 122. This duplication can lead to inconsistencies if the value is updated in one place but not the other. To improve maintainability and prevent potential bugs, consider defining this default value as a constant and referencing it in both locations. For example, you could add _DEFAULT_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH = 64 at the module level and use this constant here and at line 122.

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

mgoin

LGTM, thanks!

…ect#25495) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

…ect#25495) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: gaojc <1055866782@qq.com>

…ect#25495) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

…ect#25495) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

…ect#25495) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

increase default splits

cc20802

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

gemini-code-assist bot reviewed Sep 23, 2025

View reviewed changes

change to 32

1ae089a

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

mgoin added performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed labels Sep 23, 2025

mgoin added this to the v0.11.0 milestone Sep 23, 2025

mgoin approved these changes Sep 23, 2025

View reviewed changes

mgoin enabled auto-merge (squash) September 23, 2025 23:52

vllm-bot merged commit e0b24ea into main Sep 23, 2025
55 of 56 checks passed

vllm-bot deleted the lwilkinson/increase-default-splits branch September 23, 2025 23:53

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Perf] Increase default max splits for FA3 full cudagraphs (vllm-proj…

fb97429

…ect#25495) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

[Perf] Increase default max splits for FA3 full cudagraphs (#25495)

d562c2e

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

gjc0824 pushed a commit to gjc0824/vllm that referenced this pull request Oct 10, 2025

[Perf] Increase default max splits for FA3 full cudagraphs (vllm-proj…

a7577a5

…ect#25495) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: gaojc <1055866782@qq.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[Perf] Increase default max splits for FA3 full cudagraphs (vllm-proj…

37f331f

…ect#25495) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025

[Perf] Increase default max splits for FA3 full cudagraphs (vllm-proj…

1eb1182

…ect#25495) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Perf] Increase default max splits for FA3 full cudagraphs (vllm-proj…

10dfb1b

…ect#25495) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Perf] Increase default max splits for FA3 full cudagraphs (vllm-proj…

871b117

…ect#25495) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

[Perf] Increase default max splits for FA3 full cudagraphs #25495

[Perf] Increase default max splits for FA3 full cudagraphs #25495

LucasWilkinson commented Sep 23, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 23, 2025

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Uh oh!

[Perf] Increase default max splits for FA3 full cudagraphs #25495

[Perf] Increase default max splits for FA3 full cudagraphs #25495

Conversation

LucasWilkinson commented Sep 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

LucasWilkinson commented Sep 23, 2025 •

edited by github-actions bot

Loading