-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[UX] Fail if an invalid attention backend is specified #22217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: mgoin <michael@neuralmagic.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly changes the behavior to fail fast when an invalid attention backend is specified via an environment variable, rather than silently falling back to a default. The implementation is straightforward and the tests are updated accordingly to verify the new exception-raising behavior. I've identified one potential issue related to caching that could cause stale environment variable values to be used, which could lead to unexpected behavior.
| if selected_backend is None: | ||
| raise ValueError( | ||
| f"Invalid attention backend: '{backend_by_env_var}'. " | ||
| f"Valid backends are: {list(_Backend.__members__.keys())}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _cached_get_attn_backend function is decorated with @cache, which memoizes its results. However, it reads envs.VLLM_ATTENTION_BACKEND directly. This can lead to incorrect behavior if the environment variable is changed while the process is running, as a cached result based on a stale value of the environment variable might be returned.
A similar issue is already addressed for VLLM_USE_V1 in the get_attn_backend wrapper function, where the environment variable is read and passed as an argument to the cached function. A similar approach should be taken for VLLM_ATTENTION_BACKEND to ensure that changes to the backend configuration are always respected.
DarkLight1337
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks for improving the UX!
|
Plugin Tests are failing because of this PR: https://buildkite.com/vllm/ci/builds/26033/steps/canvas?sid=01987a40-050a-40d2-b36c-92e23ebf4d0c |
…22217) Signed-off-by: mgoin <michael@neuralmagic.com>
…22217) Signed-off-by: mgoin <michael@neuralmagic.com>
…22217) Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
…22217) Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: Noam Gat <noamgat@gmail.com>
…22217) Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>
…22217) Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
…22217) Signed-off-by: mgoin <michael@neuralmagic.com>
…22217) Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>
…22217) Signed-off-by: mgoin <michael@neuralmagic.com>
Purpose
As the title states, I think if an invalid attention backend is manually specified like VLLM_ATTENTION_BACKEND=INVALID it should fail rather than fallback to some default
Rob found that this PR #21966 causes fallback to V0 if that env variable is set incorrectly
Test Plan
CI
Test Result