-
-
Notifications
You must be signed in to change notification settings - Fork 11k
[Attention] Refactor CUDA attention backend selection logic #24794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
MatthewBonanni
wants to merge
105
commits into
vllm-project:main
Choose a base branch
from
MatthewBonanni:backend_selection_refactor
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,300
−1,012
Open
Changes from all commits
Commits
Show all changes
105 commits
Select commit
Hold shift + click to select a range
eb34c8d
add support methods to abstract
MatthewBonanni 87edf38
remove is_attn_backend_supported
MatthewBonanni fc493ae
all backends are V1 now
MatthewBonanni 9618979
use backend_to_class_str
MatthewBonanni 8aeb461
add MLA backend support details
MatthewBonanni eb8426f
use backend_to_class_str
MatthewBonanni aba576c
add support details for standard attention backends
MatthewBonanni ff18a9a
update cuda logic
MatthewBonanni eaed800
Merge branch 'main' into backend_selection_refactor
MatthewBonanni 9687c99
Merge branch 'main' into backend_selection_refactor
MatthewBonanni df49484
fix pre-commit
MatthewBonanni ff5ad7c
fix argument mismatch
MatthewBonanni 712ae59
fix pre-commit
MatthewBonanni 97e1a2c
use block size literals
MatthewBonanni 8f86714
replace backend_name_to_enum with direct calls
MatthewBonanni 50596d8
use DeviceCapability objects
MatthewBonanni 03f6963
update max
MatthewBonanni 3bee84e
Fix block size adjustment
MatthewBonanni a716f3a
Merge branch 'main' into backend_selection_refactor
MatthewBonanni 15234bb
Merge branch 'main' into backend_selection_refactor
MatthewBonanni 2433669
split priorities by capability, update flashinfer min capability
MatthewBonanni a3617d7
change to typing imports
MatthewBonanni 81d1b7b
backends specify their required kv cache layout
MatthewBonanni adaf53b
flashinfer supports up to 12.1
MatthewBonanni d1f1362
is_mla is false in base class
MatthewBonanni abb8375
triton supports fp8
MatthewBonanni 85d8719
use CacheDType
MatthewBonanni 1ef0417
add todo
MatthewBonanni a2c902f
Merge branch 'main' into backend_selection_refactor
MatthewBonanni 16f9373
is_quantized_kv_cache use CacheDType
MatthewBonanni 8474a14
fix supports_sink
MatthewBonanni 62e6290
fix priority list
MatthewBonanni 22dd1b8
fix FA block sizes
MatthewBonanni 4bf076d
Merge branch 'main' into backend_selection_refactor
MatthewBonanni 121d442
fix import failure
MatthewBonanni 963cc9f
fix import error
MatthewBonanni 778cd98
Merge branch 'main' into backend_selection_refactor
MatthewBonanni de3f302
fix import error
MatthewBonanni bc10bee
fix import
MatthewBonanni 05aab3e
fix type error
MatthewBonanni 7936c47
add flashmla support test
MatthewBonanni 4f0f955
clean up head size validation
MatthewBonanni feded36
Merge branch 'main' into backend_selection_refactor
MatthewBonanni d8b8043
use KVCacheLayoutType
MatthewBonanni a3ccbba
move selector layout change to same place as block size change
MatthewBonanni 3285c2c
MLA only supports head size 576
MatthewBonanni 6eab504
fix kv_cache_dtype support logic
MatthewBonanni 5523dac
fix test
MatthewBonanni 58fc888
skip FA MLA if test is run on hardware where it's not supported
MatthewBonanni 17fd954
fix test
MatthewBonanni 2b23712
fix pre-commit
MatthewBonanni 68a63b7
Merge branch 'main' into backend_selection_refactor
MatthewBonanni fc1d3f3
fix head size
MatthewBonanni ecdef49
fix pre-commit
MatthewBonanni 9008e56
flashinfer_mla only support blackwell (only uses TRTLLM kernels)
MatthewBonanni b756ceb
compute capability checks
MatthewBonanni afccece
remove reference to backend_name_to_enum
MatthewBonanni 33cb1ef
fix default block size
MatthewBonanni 3f5439e
improve logs
MatthewBonanni 75fce85
fix block size support
MatthewBonanni ba51339
fix getting priority list
MatthewBonanni d49fbf9
remove redundant block size methods
MatthewBonanni dd31329
Merge branch 'main' into backend_selection_refactor
MatthewBonanni b18a193
fix import
MatthewBonanni 0e0cb6d
raise error instead of implicitly changing backend
MatthewBonanni 1a7b366
Merge branch 'main' into backend_selection_refactor
MatthewBonanni f147663
Merge branch 'main' into backend_selection_refactor
MatthewBonanni 1eefe90
don't ignore block size
MatthewBonanni 97bee04
move block_size update back to check_and_update_config
MatthewBonanni 0812fac
fix import
MatthewBonanni ec39247
address missing case
MatthewBonanni e6497dd
Merge branch 'main' into backend_selection_refactor
MatthewBonanni 860bfdb
fix flashmla_sparse support
MatthewBonanni df1cd64
fix hybrid models
MatthewBonanni 758b3a5
Merge branch 'main' into backend_selection_refactor
MatthewBonanni 01b43ff
return only mla or non-mla priorities
MatthewBonanni ee894ea
cleanup
MatthewBonanni 842e89b
skip test on hopper
MatthewBonanni bd190e7
temp: apply fixes for test
MatthewBonanni 5bf94f6
Revert "skip test on hopper"
MatthewBonanni 7e34939
revert to old check_and_update_config block_size logic
MatthewBonanni 3b1e92f
Revert "temp: apply fixes for test"
MatthewBonanni 54dffe2
Merge branch 'main' into backend_selection_refactor
MatthewBonanni db6cc0f
Merge branch 'main' into backend_selection_refactor
MatthewBonanni d34eb77
add test_attention_selector to Blackwell Tests
MatthewBonanni 48290ee
rename _Backend to AttentionBackendEnum, add class methods
MatthewBonanni 1c71eab
get rid of get_min_compute_capability and get_max_compute_capability
MatthewBonanni 6e9d1f1
fix pre-commit
MatthewBonanni d3cdda7
change methods to properties
MatthewBonanni 925069c
device_capability not None
MatthewBonanni a0b56c5
query device_capability inside get_required_kv_cache_layout
MatthewBonanni fff453a
Update vllm/attention/backends/abstract.py
MatthewBonanni 95aae78
Merge branch 'main' into backend_selection_refactor
MatthewBonanni 530f356
class_path always None in decorator
MatthewBonanni 933ee5f
type hint for value
MatthewBonanni 255edc9
restore comment
MatthewBonanni 6af36aa
Merge branch 'main' into backend_selection_refactor
MatthewBonanni c9d62f8
fix docs
MatthewBonanni 93a0770
Merge branch 'main' into backend_selection_refactor
MatthewBonanni f6a5a32
add FLASHMLA_SPARSE to priority list
MatthewBonanni bc91050
Merge branch 'main' into backend_selection_refactor
MatthewBonanni 0435eca
fix test
MatthewBonanni a098d82
fix flashmla_sparse
MatthewBonanni d8215e0
Merge branch 'main' into backend_selection_refactor
MatthewBonanni 4452f5f
fix pre-commit
MatthewBonanni File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.