Skip to content

Conversation

@LucasWilkinson
Copy link
Collaborator

@LucasWilkinson LucasWilkinson commented Sep 23, 2025

#25274 provides evidence that 32 would be a much better default

due to full-CG potential becoming default #25444 seems like a good time to improve this

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request increases the default value for VLLM_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH from 16 to 64, aiming to improve performance for FA3 full cudagraphs. The change is applied consistently in both the type-checking block and the runtime environment variable definition. While the change itself is straightforward, I've identified a maintainability issue with the duplicated default value. I've left a comment suggesting to use a constant to avoid potential inconsistencies in the future.

vllm/envs.py Outdated
Comment on lines 1014 to 1015
lambda: int(os.getenv("VLLM_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH",
"16")),
"64")),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The default value '64' is hardcoded here and also in the TYPE_CHECKING block at line 122. This duplication can lead to inconsistencies if the value is updated in one place but not the other. To improve maintainability and prevent potential bugs, consider defining this default value as a constant and referencing it in both locations. For example, you could add _DEFAULT_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH = 64 at the module level and use this constant here and at line 122.

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
@mgoin mgoin added performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed labels Sep 23, 2025
@mgoin mgoin added this to the v0.11.0 milestone Sep 23, 2025
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@mgoin mgoin enabled auto-merge (squash) September 23, 2025 23:52
@vllm-bot vllm-bot merged commit e0b24ea into main Sep 23, 2025
55 of 56 checks passed
@vllm-bot vllm-bot deleted the lwilkinson/increase-default-splits branch September 23, 2025 23:53
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
gjc0824 pushed a commit to gjc0824/vllm that referenced this pull request Oct 10, 2025
…ect#25495)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: gaojc <1055866782@qq.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…ect#25495)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…ect#25495)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants