-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
[Attention] Move Backend enum into registry #25893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a good step in refactoring the attention backend by moving the _Backend enum to a centralized registry. The changes are consistent and improve the code structure. I've found one area for improvement in vllm/envs.py where a dynamic import is no longer necessary and can be simplified, which will enhance readability and maintainability.
fcc6c0e to
16a1f40
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
16a1f40 to
7c31e39
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @MatthewBonanni thanks a lot for splitting this up, it's going to be much easier to review!
Evolving the registry into something more dynamic (for OOT plugins) makes sense with this structure imo.
Just one question, what is this PR going to enable as next step? Ie what's the first benefit the backend registry unlocks?
This change looks fairly safe to me, pinging @mgoin too.
| import enum | ||
|
|
||
|
|
||
| class _Backend(enum.Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I suppose this is where @ILikeIneine would like to plug in a different attn backend.
I think we can add that later, although this "registry" here is a very manual one: usually registries should expose a way to get added to the registry, here it's just a plain enum for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I see what you're saying. We could replace the enum with some other object, but the ergonomics of the enum are quite nice. Maybe we could construct an enum at import time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checked out the other PR, I think it's fine as is and then we add the actual registration mechanism in the next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a future P: I think it might be nice to make this enum platform specific?
|
@NickLucche thanks for your review! The next step is shown in #25900, in which backends can register themselves in the |
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; one nit, thanks for doing this!
| import enum | ||
|
|
||
|
|
||
| class _Backend(enum.Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a future P: I think it might be nice to make this enum platform specific?
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
@LucasWilkinson thanks for your review! Will definitely figure out how to make distinctions between platforms in the future. If some backends support multiple platforms (not sure if this is the case) it could be better to add a |
vllm-project/vllm#25893 Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
### What this PR does / why we need it? This is the step 1 of refactoring code to adapt with vllm main, and this pr aligned with vllm-project/vllm@17c540a 1. refactor deepseek to the latest code arch as of vllm-project/vllm@17c540a 2. bunches of fixes due to vllm changes - Fix `AscendScheduler` `__post_init__`, caused by vllm-project/vllm#25075 - Fix `AscendScheduler` init got an unexpected arg `block_size`, caused by vllm-project/vllm#26296 - Fix `KVCacheManager` `get_num_common_prefix_blocks` arg, caused by vllm-project/vllm#23485 - Fix `MLAAttention` import,caused by vllm-project/vllm#25103 - Fix `SharedFusedMoE` import, caused by vllm-project/vllm#26145 - Fix `LazyLoader` improt, caused by vllm-project/vllm#27022 - Fix `vllm.utils.swap_dict_values` improt, caused by vllm-project/vllm#26990 - Fix `Backend` enum import, caused by vllm-project/vllm#25893 - Fix `CompilationLevel` renaming to `CompilationMode` issue introduced by vllm-project/vllm#26355 - Fix fused_moe ops, caused by vllm-project/vllm#24097 - Fix bert model because of `inputs_embeds`, caused by vllm-project/vllm#25922 - Fix MRope because of `get_input_positions_tensor` to `get_mrope_input_positions`, caused by vllm-project/vllm#24172 - Fix `splitting_ops` changes introduced by vllm-project/vllm#25845 - Fix multi-modality changes introduced by vllm-project/vllm#16229 - Fix lora bias dropping issue introduced by vllm-project/vllm#25807 - Fix structured ouput break introduced by vllm-project/vllm#26737 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? CI passed with existing test. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Icey <1790571317@qq.com> Co-authored-by: Icey <1790571317@qq.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Purpose
Creates
vllm/attention/backends/registry.py, containing the_Backendenum. This PR follows #25489 and is a component of the larger attention backend refactor (now being split up), #24794. This file will contain more utility functions in the future.Test Plan
CI should be sufficient
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.