-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[Attention]: Fix Torch compile error when --calculate-kv-scales is enable #23912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix a torch.compile error caused by data-dependent branching when calculating KV scales. The approach of using torch._dynamo.disable() is correct. However, the implementation has some redundant code and a potential bug that could lead to an AttributeError. My review includes a suggestion to fix this and simplify the code.
0f2f704 to
e5bc4eb
Compare
|
This code change ( It seems that the error is because torch compile is enabled by default in V1, and the test case for Anyway, this PR fixes the issue. Could you take some time to review it? |
Dynamo report error: ERROR 08-28 14:27:49 [core.py:632] File "/opt/conda/envs/torch-base/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 224, in get_response ERROR 08-28 14:27:49 [core.py:632] raise RuntimeError( ERROR 08-28 14:27:49 [core.py:632] RuntimeError: Worker failed with error 'Data-dependent branching ERROR 08-28 14:27:49 [core.py:632] Explanation: Detected data-dependent branching (e.g. if my_tensor.sum() > 0:). Dynamo does not support tracing dynamic control flow. ERROR 08-28 14:27:49 [core.py:632] Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround. ... ERROR 08-28 14:27:49 [core.py:632] hidden_states = self.self_attn( ERROR 08-28 14:27:49 [core.py:632] File "/opt/conda/envs/torch-base/lib/python3.12/site-packages/vllm/model_executor/models/qwen3.py", line 145, in forward ERROR 08-28 14:27:49 [core.py:632] attn_output = self.attn(q, k, v) ERROR 08-28 14:27:49 [core.py:632] File "/opt/conda/envs/torch-base/lib/python3.12/site-packages/vllm/attention/layer.py", line 239, in forward ERROR 08-28 14:27:49 [core.py:632] if attn_metadata.enable_kv_scales_calculation: ERROR 08-28 14:27:49 [core.py:632] ERROR 08-28 14:27:49 [core.py:632] Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" As error message states, there is a data dependent branching in attention code, so this fix make dynamo don't compile the kv scale calc code. Signed-off-by: Asher Zhang <asherszhang@tencent.com>
98e9c60 to
52a77fb
Compare
|
We should probably wrap this in a custom op instead of disabling dynamo, unless that happens automatically? What's the problematic part of kv scales calculation? |
Thanks for review, @ProExpertProg The problem is not in the kv-scale calculation, it's was dynamo cannot trace a data dependent branch, which happens in following code : with this error: I tried the troch.cond, but since I agree more cleaner way is to refactor this a customized op, but I just want to quick fix this function first, and maybe find a better way later with more discuss with the author of this function. |
| enable = bool( | ||
| getattr(attn_metadata, 'enable_kv_scales_calculation', False)) | ||
| if enable: | ||
| torch._dynamo.disable()(self.calc_kv_scales)(query, key, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not going to work, vLLM requires torch.compile with fullgraph=True, torch._dynamo.disable induces a graph break
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this works, please see my comment in #24290 (comment)
|
See #21640 for info and to stay updated on the fix |
Purpose
Fix torch error when enable --calculate-kv-scales is on.
Dynamo report error:
ERROR 08-28 14:27:49 [core.py:632] File "/opt/conda/envs/torch-base/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 224, in get_response ERROR 08-28 14:27:49 [core.py:632] raise RuntimeError( ERROR 08-28 14:27:49 [core.py:632] RuntimeError: Worker failed with error 'Data-dependent branching ERROR 08-28 14:27:49 [core.py:632] Explanation: Detected data-dependent branching (e.g. if my_tensor.sum() > 0:). Dynamo does not support tracing dynamic control flow. ERROR 08-28 14:27:49 [core.py:632] Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.
...
ERROR 08-28 14:27:49 [core.py:632] hidden_states = self.self_attn( ERROR 08-28 14:27:49 [core.py:632] File "/opt/conda/envs/torch-base/lib/python3.12/site-packages/vllm/model_executor/models/qwen3.py", line 145, in forward ERROR 08-28 14:27:49 [core.py:632] attn_output = self.attn(q, k, v) ERROR 08-28 14:27:49 [core.py:632] File "/opt/conda/envs/torch-base/lib/python3.12/site-packages/vllm/attention/layer.py", line 239, in forward ERROR 08-28 14:27:49 [core.py:632] if attn_metadata.enable_kv_scales_calculation: ERROR 08-28 14:27:49 [core.py:632] ERROR 08-28 14:27:49 [core.py:632] Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
As error message states, there is a data dependent branching in attention code, so this fix make dynamo don't compile the kv scale calc code.
Test Plan
vllm serve -tp 2 --kv-cache-dtype fp8 --calculate-kv-scales --enable-chunked-prefill --trust_remote_code /root/workspace/Model/Qwen3-4B/
Test Result
Not report error when torch compile.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.