Skip to content

Conversation

@WoosukKwon
Copy link
Collaborator

No description provided.

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
@github-actions
Copy link

github-actions bot commented Aug 6, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the ci/build label Aug 6, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the flash-attention dependency to a newer commit, with the stated goal of upgrading to a version that supports attention sinks in FlashAttention v3. While updating the dependency is a necessary step, the pull request appears to be incomplete. There are no corresponding code changes within the vLLM codebase to actually configure or utilize the new attention sink functionality. The implementation seems to continue using a sliding window with a hardcoded sink size of 0, which means the primary objective of this PR is not met.

vllm-flash-attn
GIT_REPOSITORY https://github.com/vllm-project/flash-attention.git
GIT_TAG 1c2624e53c078854e0637ee566c72fe2107e75f4
GIT_TAG b99f8c821771fd11feb66d5c89661e9858fde359
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This change updates the flash-attention dependency, and the pull request title suggests this is for enabling 'attention sink' functionality. However, the rest of the codebase does not seem to be updated to use this feature. For example, in vllm/attention/backends/flash_attn.py, the sliding_window parameter for FlashAttention is initialized with a sink size of 0. This means that even with the updated dependency, the attention sink feature will not be active. To fulfill the goal of this PR, code changes are required to configure and utilize the attention sink size. Without them, this PR is incomplete.

@WoosukKwon WoosukKwon merged commit e3c876d into main Aug 6, 2025
10 of 14 checks passed
@WoosukKwon WoosukKwon deleted the woosuk/fa3-attn-sink branch August 6, 2025 04:36
npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
myselvess pushed a commit to myselvess/vllm that referenced this pull request Aug 7, 2025
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
noamgat pushed a commit to noamgat/vllm that referenced this pull request Aug 9, 2025
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Noam Gat <noamgat@gmail.com>
paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Paul Pak <paulpak58@gmail.com>
diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Diego-Castan <diego.castan@ibm.com>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Xiao Yu <xiao.yu@amd.com>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants