Skip to content

Conversation

@mgoin
Copy link
Member

@mgoin mgoin commented Aug 4, 2025

Purpose

As the title states, I think if an invalid attention backend is manually specified like VLLM_ATTENTION_BACKEND=INVALID it should fail rather than fallback to some default

Rob found that this PR #21966 causes fallback to V0 if that env variable is set incorrectly

WARNING 08-04 15:00:03 [arg_utils.py:1771] VLLM_ATTENTION_BACKEND=CUTLASS_MLA_VLLM_V1 is not supported by the V1 Engine. Falling back to V0. We recommend to remove VLLM_ATTENTION_BACKEND=CUTLASS_MLA_VLLM_V1 from your config in favor of the V1 Engine.

Test Plan

CI

Test Result

Signed-off-by: mgoin <michael@neuralmagic.com>
@github-actions
Copy link

github-actions bot commented Aug 4, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mgoin mgoin changed the title Fail if an invalid attention backend is specified [UX] Fail if an invalid attention backend is specified Aug 4, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly changes the behavior to fail fast when an invalid attention backend is specified via an environment variable, rather than silently falling back to a default. The implementation is straightforward and the tests are updated accordingly to verify the new exception-raising behavior. I've identified one potential issue related to caching that could cause stale environment variable values to be used, which could lead to unexpected behavior.

Comment on lines +196 to +199
if selected_backend is None:
raise ValueError(
f"Invalid attention backend: '{backend_by_env_var}'. "
f"Valid backends are: {list(_Backend.__members__.keys())}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The _cached_get_attn_backend function is decorated with @cache, which memoizes its results. However, it reads envs.VLLM_ATTENTION_BACKEND directly. This can lead to incorrect behavior if the environment variable is changed while the process is running, as a cached result based on a stale value of the environment variable might be returned.

A similar issue is already addressed for VLLM_USE_V1 in the get_attn_backend wrapper function, where the environment variable is read and passed as an argument to the cached function. A similar approach should be taken for VLLM_ATTENTION_BACKEND to ensure that changes to the backend configuration are always respected.

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks for improving the UX!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) August 5, 2025 03:00
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 5, 2025
@vllm-bot vllm-bot merged commit e79a12f into vllm-project:main Aug 5, 2025
54 of 56 checks passed
@DarkLight1337
Copy link
Member

DarkLight1337 commented Aug 5, 2025

@mgoin mgoin deleted the fail-invalid-attn branch August 5, 2025 22:50
npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025
myselvess pushed a commit to myselvess/vllm that referenced this pull request Aug 7, 2025
jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025
…22217)

Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
noamgat pushed a commit to noamgat/vllm that referenced this pull request Aug 9, 2025
…22217)

Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: Noam Gat <noamgat@gmail.com>
paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025
…22217)

Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com>
diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025
…22217)

Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: Diego-Castan <diego.castan@ibm.com>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
…22217)

Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: Xiao Yu <xiao.yu@amd.com>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants