-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
[FA/Chore] Bump FA version #26109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FA/Chore] Bump FA version #26109
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the vllm-flash-attn dependency to a newer commit to incorporate fixes for building with recent CUDA versions. The change is correct and necessary. My review includes one suggestion to improve the long-term maintainability of the build configuration by adding a comment to explain the purpose of the new commit hash, as using raw commit hashes without context can make future updates and debugging more difficult.
| vllm-flash-attn | ||
| GIT_REPOSITORY https://github.com/vllm-project/flash-attention.git | ||
| GIT_TAG 4695e6bed5366c41e28c06cd86170166e4f43d00 | ||
| GIT_TAG 45b5dac5497e848d788af686bb68cbe5cf2e56bc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a raw commit hash for a dependency makes it difficult to understand what version is being used and why it was chosen. This can complicate future maintenance and updates. It is a best practice to use version tags for dependencies. If a tag is not available for this commit, please add a comment explaining what this commit hash corresponds to (e.g., what features or fixes it includes) and why it was chosen. This will improve the maintainability of the build system.
# Pick up fixes for CUDA 12.4 (GH#95) and 12.5 (GH#96)
GIT_TAG 45b5dac5497e848d788af686bb68cbe5cf2e56bc
|
super! thank you guys! |
|
@LucasWilkinson @ProExpertProg @mgoin could you review this? #26098 |
|
@ProExpertProg i don't know why is it failing, because im using it on my gh200/thor and compiles and works well |
|
@LucasWilkinson can you make sense of the logs? I can't relate it to the changes that are being picked up. |
|
Can someone try to reproduce the CI failures? It might just be flaky tests |
|
@ProExpertProg I installed the fork from source (to really make it pick up the FA3 changes) and reran the two tests that failed:
So I think this PR is not the issue here. Can we restart the test or overwrite? |
|
Actually the Distributed Tests (2 GPUs) failure looks different from the ones on main PR: https://buildkite.com/vllm/ci/builds/33628/steps/canvas?jid=0199b666-16af-49aa-b2bd-5e5c5f188375 And this landed commit on main should have fixed it https://buildkite.com/vllm/ci/builds/34222/steps/canvas?jid=0199c900-b19d-4a2c-ab37-15327e7d63b0 |
|
This pull request has merge conflicts that must be resolved before it can be |
|
superseded by: #25049 |
Pull request was closed
Bump vLLM Flash Attention to pickup:
vllm-project/flash-attention#96
vllm-project/flash-attention#95