Upgrade FA3 for attention sink #22313

WoosukKwon · 2025-08-06T04:34:42Z

No description provided.

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

github-actions · 2025-08-06T04:34:50Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request updates the flash-attention dependency to a newer commit, with the stated goal of upgrading to a version that supports attention sinks in FlashAttention v3. While updating the dependency is a necessary step, the pull request appears to be incomplete. There are no corresponding code changes within the vLLM codebase to actually configure or utilize the new attention sink functionality. The implementation seems to continue using a sliding window with a hardcoded sink size of 0, which means the primary objective of this PR is not met.

gemini-code-assist · 2025-08-06T04:35:25Z

cmake/external_projects/vllm_flash_attn.cmake

          vllm-flash-attn
          GIT_REPOSITORY https://github.com/vllm-project/flash-attention.git
-          GIT_TAG 1c2624e53c078854e0637ee566c72fe2107e75f4
+          GIT_TAG b99f8c821771fd11feb66d5c89661e9858fde359


This change updates the flash-attention dependency, and the pull request title suggests this is for enabling 'attention sink' functionality. However, the rest of the codebase does not seem to be updated to use this feature. For example, in vllm/attention/backends/flash_attn.py, the sliding_window parameter for FlashAttention is initialized with a sink size of 0. This means that even with the updated dependency, the attention sink feature will not be active. To fulfill the goal of this PR, code changes are required to configure and utilize the attention sink size. Without them, this PR is incomplete.

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Noam Gat <noamgat@gmail.com>

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Paul Pak <paulpak58@gmail.com>

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Diego-Castan <diego.castan@ibm.com>

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

Upgrade FA3 for attention sink

3236aab

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

mergify bot added the ci/build label Aug 6, 2025

gemini-code-assist bot reviewed Aug 6, 2025

View reviewed changes

WoosukKwon merged commit e3c876d into main Aug 6, 2025
10 of 14 checks passed

WoosukKwon deleted the woosuk/fa3-attn-sink branch August 6, 2025 04:36

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025

Upgrade FA3 for attention sink (vllm-project#22313)

3b8942f

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

myselvess pushed a commit to myselvess/vllm that referenced this pull request Aug 7, 2025

Upgrade FA3 for attention sink (vllm-project#22313)

32d32a4

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025

Upgrade FA3 for attention sink (vllm-project#22313)

a463fb5

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

noamgat pushed a commit to noamgat/vllm that referenced this pull request Aug 9, 2025

Upgrade FA3 for attention sink (vllm-project#22313)

5f45969

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Noam Gat <noamgat@gmail.com>

paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025

Upgrade FA3 for attention sink (vllm-project#22313)

3d44373

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Paul Pak <paulpak58@gmail.com>

diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025

Upgrade FA3 for attention sink (vllm-project#22313)

d03e2e5

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Diego-Castan <diego.castan@ibm.com>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

Upgrade FA3 for attention sink (vllm-project#22313)

9cde43a

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

Upgrade FA3 for attention sink (vllm-project#22313)

3ec8b42

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

Upgrade FA3 for attention sink (vllm-project#22313)

4f536a8

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Upgrade FA3 for attention sink #22313

Upgrade FA3 for attention sink #22313

Uh oh!

WoosukKwon commented Aug 6, 2025

Uh oh!

github-actions bot commented Aug 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Upgrade FA3 for attention sink #22313

Upgrade FA3 for attention sink #22313

Uh oh!

Conversation

WoosukKwon commented Aug 6, 2025

Uh oh!

github-actions bot commented Aug 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants