[Attention]: Fix Torch compile error when --calculate-kv-scales is enable #23912

kzjeef · 2025-08-29T08:07:05Z

Purpose

Fix torch error when enable --calculate-kv-scales is on.

Dynamo report error:
ERROR 08-28 14:27:49 [core.py:632] File "/opt/conda/envs/torch-base/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 224, in get_response ERROR 08-28 14:27:49 [core.py:632] raise RuntimeError( ERROR 08-28 14:27:49 [core.py:632] RuntimeError: Worker failed with error 'Data-dependent branching ERROR 08-28 14:27:49 [core.py:632] Explanation: Detected data-dependent branching (e.g. if my_tensor.sum() > 0:). Dynamo does not support tracing dynamic control flow. ERROR 08-28 14:27:49 [core.py:632] Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.

...

ERROR 08-28 14:27:49 [core.py:632] hidden_states = self.self_attn( ERROR 08-28 14:27:49 [core.py:632] File "/opt/conda/envs/torch-base/lib/python3.12/site-packages/vllm/model_executor/models/qwen3.py", line 145, in forward ERROR 08-28 14:27:49 [core.py:632] attn_output = self.attn(q, k, v) ERROR 08-28 14:27:49 [core.py:632] File "/opt/conda/envs/torch-base/lib/python3.12/site-packages/vllm/attention/layer.py", line 239, in forward ERROR 08-28 14:27:49 [core.py:632] if attn_metadata.enable_kv_scales_calculation: ERROR 08-28 14:27:49 [core.py:632] ERROR 08-28 14:27:49 [core.py:632] Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

As error message states, there is a data dependent branching in attention code, so this fix make dynamo don't compile the kv scale calc code.

Test Plan

vllm serve -tp 2 --kv-cache-dtype fp8 --calculate-kv-scales --enable-chunked-prefill --trust_remote_code /root/workspace/Model/Qwen3-4B/

Test Result

Not report error when torch compile.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request aims to fix a torch.compile error caused by data-dependent branching when calculating KV scales. The approach of using torch._dynamo.disable() is correct. However, the implementation has some redundant code and a potential bug that could lead to an AttributeError. My review includes a suggestion to fix this and simplify the code.

vllm/attention/layer.py

kzjeef · 2025-08-29T17:34:52Z

This code change (if ctx_attn_metadata.enable_kv_scales_calculation) was introduced by @mgoin in February 2025. I believe there was no torch compile error at that time, but it definitely occurs in the current version.

It seems that the error is because torch compile is enabled by default in V1, and the test case for calculate_kv_scales has leaked.

Anyway, this PR fixes the issue. Could you take some time to review it?
@youkaichao @LucasWilkinson

vllm/attention/layer.py

Dynamo report error: ERROR 08-28 14:27:49 [core.py:632] File "/opt/conda/envs/torch-base/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 224, in get_response ERROR 08-28 14:27:49 [core.py:632] raise RuntimeError( ERROR 08-28 14:27:49 [core.py:632] RuntimeError: Worker failed with error 'Data-dependent branching ERROR 08-28 14:27:49 [core.py:632] Explanation: Detected data-dependent branching (e.g. if my_tensor.sum() > 0:). Dynamo does not support tracing dynamic control flow. ERROR 08-28 14:27:49 [core.py:632] Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround. ... ERROR 08-28 14:27:49 [core.py:632] hidden_states = self.self_attn( ERROR 08-28 14:27:49 [core.py:632] File "/opt/conda/envs/torch-base/lib/python3.12/site-packages/vllm/model_executor/models/qwen3.py", line 145, in forward ERROR 08-28 14:27:49 [core.py:632] attn_output = self.attn(q, k, v) ERROR 08-28 14:27:49 [core.py:632] File "/opt/conda/envs/torch-base/lib/python3.12/site-packages/vllm/attention/layer.py", line 239, in forward ERROR 08-28 14:27:49 [core.py:632] if attn_metadata.enable_kv_scales_calculation: ERROR 08-28 14:27:49 [core.py:632] ERROR 08-28 14:27:49 [core.py:632] Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" As error message states, there is a data dependent branching in attention code, so this fix make dynamo don't compile the kv scale calc code. Signed-off-by: Asher Zhang <asherszhang@tencent.com>

youkaichao · 2025-09-04T12:49:17Z

cc @ProExpertProg @zou3519

ProExpertProg · 2025-09-04T13:50:28Z

We should probably wrap this in a custom op instead of disabling dynamo, unless that happens automatically? What's the problematic part of kv scales calculation?

kzjeef · 2025-09-05T03:03:16Z

We should probably wrap this in a custom op instead of disabling dynamo, unless that happens automatically? What's the problematic part of kv scales calculation?

Thanks for review, @ProExpertProg

The problem is not in the kv-scale calculation, it's was dynamo cannot trace a data dependent branch, which happens in following code :

            attn_metadata = get_forward_context().attn_metadata
            if attn_metadata.enable_kv_scales_calculation:
                self.calc_kv_scales(query, key, value)

with this error:

 Explanation: Detected data-dependent branching (e.g. if my_tensor.sum() > 0:). Dynamo does not support tracing dynamic control flow. 
ERROR 08-28 14:27:49 [core.py:632] Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.

I tried the troch.cond, but since attn_metadata.enable_kv_scales_calculation is not a tensor, make this code more error-prone, so I tried disable the torch compile for this part of code, but since it's keep the cuda-graph, from my test, the speed is not affected.

I agree more cleaner way is to refactor this a customized op, but I just want to quick fix this function first, and maybe find a better way later with more discuss with the author of this function.

zou3519 · 2025-09-08T19:57:43Z

vllm/attention/layer.py

+            enable = bool(
+                getattr(attn_metadata, 'enable_kv_scales_calculation', False))
+            if enable:
+                torch._dynamo.disable()(self.calc_kv_scales)(query, key, value)


This is not going to work, vLLM requires torch.compile with fullgraph=True, torch._dynamo.disable induces a graph break

zou3519

I don't think this works, please see my comment in #24290 (comment)

ProExpertProg · 2025-09-22T19:24:01Z

See #21640 for info and to stay updated on the fix

gemini-code-assist bot reviewed Aug 29, 2025

View reviewed changes

vllm/attention/layer.py Outdated Show resolved Hide resolved

kzjeef force-pushed the fix-fp8-kv-scale branch from 0f2f704 to e5bc4eb Compare August 29, 2025 08:14

elvischenv reviewed Sep 3, 2025

View reviewed changes

vllm/attention/layer.py Outdated Show resolved Hide resolved

kzjeef force-pushed the fix-fp8-kv-scale branch from 98e9c60 to 52a77fb Compare September 4, 2025 07:40

ProExpertProg added the torch.compile label Sep 4, 2025

github-project-automation bot added this to torch.compile integration Sep 4, 2025

github-project-automation bot moved this to To triage in torch.compile integration Sep 4, 2025

ProExpertProg moved this from To triage to Needs Reproduction in torch.compile integration Sep 4, 2025

zou3519 requested review from ProExpertProg and zou3519 September 4, 2025 20:28

elvischenv mentioned this pull request Sep 8, 2025

[Bugfix] guard missing attn_metadata in KV scales path #24290

Closed

5 tasks

zou3519 reviewed Sep 8, 2025

View reviewed changes

zou3519 requested changes Sep 8, 2025

View reviewed changes

ProExpertProg closed this Sep 22, 2025

github-project-automation bot moved this from Needs Reproduction to Done in torch.compile integration Sep 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Attention]: Fix Torch compile error when --calculate-kv-scales is enable #23912

[Attention]: Fix Torch compile error when --calculate-kv-scales is enable #23912

Uh oh!

kzjeef commented Aug 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

kzjeef commented Aug 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

youkaichao commented Sep 4, 2025

Uh oh!

ProExpertProg commented Sep 4, 2025

Uh oh!

kzjeef commented Sep 5, 2025

Uh oh!

zou3519 Sep 8, 2025

Uh oh!

zou3519 left a comment

Uh oh!

ProExpertProg commented Sep 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

[Attention]: Fix Torch compile error when --calculate-kv-scales is enable #23912

[Attention]: Fix Torch compile error when --calculate-kv-scales is enable #23912

Uh oh!

Conversation

kzjeef commented Aug 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

kzjeef commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

youkaichao commented Sep 4, 2025

Uh oh!

ProExpertProg commented Sep 4, 2025

Uh oh!

kzjeef commented Sep 5, 2025

Uh oh!

zou3519 Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

ProExpertProg commented Sep 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kzjeef commented Aug 29, 2025 •

edited by github-actions bot

Loading

kzjeef commented Aug 29, 2025 •

edited

Loading