-
-
Couldn't load subscription status.
- Fork 10.9k
[Perf] Increase default max splits for FA3 full cudagraphs #25495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request increases the default value for VLLM_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH from 16 to 64, aiming to improve performance for FA3 full cudagraphs. The change is applied consistently in both the type-checking block and the runtime environment variable definition. While the change itself is straightforward, I've identified a maintainability issue with the duplicated default value. I've left a comment suggesting to use a constant to avoid potential inconsistencies in the future.
vllm/envs.py
Outdated
| lambda: int(os.getenv("VLLM_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH", | ||
| "16")), | ||
| "64")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value '64' is hardcoded here and also in the TYPE_CHECKING block at line 122. This duplication can lead to inconsistencies if the value is updated in one place but not the other. To improve maintainability and prevent potential bugs, consider defining this default value as a constant and referencing it in both locations. For example, you could add _DEFAULT_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH = 64 at the module level and use this constant here and at line 122.
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
…ect#25495) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
…ect#25495) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: gaojc <1055866782@qq.com>
…ect#25495) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…ect#25495) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
…ect#25495) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
…ect#25495) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
#25274 provides evidence that 32 would be a much better default
due to full-CG potential becoming default #25444 seems like a good time to improve this