Skip to content

Conversation

@kzjeef
Copy link
Contributor

@kzjeef kzjeef commented Aug 29, 2025

Purpose

Fix torch error when enable --calculate-kv-scales is on.

Dynamo report error:
ERROR 08-28 14:27:49 [core.py:632] File "/opt/conda/envs/torch-base/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 224, in get_response ERROR 08-28 14:27:49 [core.py:632] raise RuntimeError( ERROR 08-28 14:27:49 [core.py:632] RuntimeError: Worker failed with error 'Data-dependent branching ERROR 08-28 14:27:49 [core.py:632] Explanation: Detected data-dependent branching (e.g. if my_tensor.sum() > 0:). Dynamo does not support tracing dynamic control flow. ERROR 08-28 14:27:49 [core.py:632] Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.

...

ERROR 08-28 14:27:49 [core.py:632] hidden_states = self.self_attn( ERROR 08-28 14:27:49 [core.py:632] File "/opt/conda/envs/torch-base/lib/python3.12/site-packages/vllm/model_executor/models/qwen3.py", line 145, in forward ERROR 08-28 14:27:49 [core.py:632] attn_output = self.attn(q, k, v) ERROR 08-28 14:27:49 [core.py:632] File "/opt/conda/envs/torch-base/lib/python3.12/site-packages/vllm/attention/layer.py", line 239, in forward ERROR 08-28 14:27:49 [core.py:632] if attn_metadata.enable_kv_scales_calculation: ERROR 08-28 14:27:49 [core.py:632] ERROR 08-28 14:27:49 [core.py:632] Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

As error message states, there is a data dependent branching in attention code, so this fix make dynamo don't compile the kv scale calc code.

Test Plan

vllm serve -tp 2 --kv-cache-dtype fp8 --calculate-kv-scales --enable-chunked-prefill --trust_remote_code /root/workspace/Model/Qwen3-4B/

Test Result

Not report error when torch compile.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a torch.compile error caused by data-dependent branching when calculating KV scales. The approach of using torch._dynamo.disable() is correct. However, the implementation has some redundant code and a potential bug that could lead to an AttributeError. My review includes a suggestion to fix this and simplify the code.

@kzjeef
Copy link
Contributor Author

kzjeef commented Aug 29, 2025

This code change (if ctx_attn_metadata.enable_kv_scales_calculation) was introduced by @mgoin in February 2025. I believe there was no torch compile error at that time, but it definitely occurs in the current version.

It seems that the error is because torch compile is enabled by default in V1, and the test case for calculate_kv_scales has leaked.

Anyway, this PR fixes the issue. Could you take some time to review it?
@youkaichao @LucasWilkinson

Dynamo report error:
ERROR 08-28 14:27:49 [core.py:632] File "/opt/conda/envs/torch-base/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 224, in get_response ERROR 08-28 14:27:49 [core.py:632] raise RuntimeError( ERROR 08-28 14:27:49 [core.py:632] RuntimeError: Worker failed with error 'Data-dependent branching ERROR 08-28 14:27:49 [core.py:632] Explanation: Detected data-dependent branching (e.g. if my_tensor.sum() > 0:). Dynamo does not support tracing dynamic control flow. ERROR 08-28 14:27:49 [core.py:632] Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.

...

ERROR 08-28 14:27:49 [core.py:632] hidden_states = self.self_attn( ERROR 08-28 14:27:49 [core.py:632] File "/opt/conda/envs/torch-base/lib/python3.12/site-packages/vllm/model_executor/models/qwen3.py", line 145, in forward
ERROR 08-28 14:27:49 [core.py:632] attn_output = self.attn(q, k, v) ERROR 08-28 14:27:49 [core.py:632] File "/opt/conda/envs/torch-base/lib/python3.12/site-packages/vllm/attention/layer.py", line 239, in forward
ERROR 08-28 14:27:49 [core.py:632] if attn_metadata.enable_kv_scales_calculation: ERROR 08-28 14:27:49 [core.py:632]
ERROR 08-28 14:27:49 [core.py:632] Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

As error message states, there is a data dependent branching in attention code,
so this fix make dynamo don't compile the kv scale calc code.

Signed-off-by: Asher Zhang <asherszhang@tencent.com>
@youkaichao
Copy link
Member

cc @ProExpertProg @zou3519

@ProExpertProg
Copy link
Collaborator

We should probably wrap this in a custom op instead of disabling dynamo, unless that happens automatically? What's the problematic part of kv scales calculation?

@kzjeef
Copy link
Contributor Author

kzjeef commented Sep 5, 2025

We should probably wrap this in a custom op instead of disabling dynamo, unless that happens automatically? What's the problematic part of kv scales calculation?

Thanks for review, @ProExpertProg

The problem is not in the kv-scale calculation, it's was dynamo cannot trace a data dependent branch, which happens in following code :

            attn_metadata = get_forward_context().attn_metadata
            if attn_metadata.enable_kv_scales_calculation:
                self.calc_kv_scales(query, key, value)

with this error:

 Explanation: Detected data-dependent branching (e.g. if my_tensor.sum() > 0:). Dynamo does not support tracing dynamic control flow. 
ERROR 08-28 14:27:49 [core.py:632] Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.

I tried the troch.cond, but since attn_metadata.enable_kv_scales_calculation is not a tensor, make this code more error-prone, so I tried disable the torch compile for this part of code, but since it's keep the cuda-graph, from my test, the speed is not affected.

I agree more cleaner way is to refactor this a customized op, but I just want to quick fix this function first, and maybe find a better way later with more discuss with the author of this function.

enable = bool(
getattr(attn_metadata, 'enable_kv_scales_calculation', False))
if enable:
torch._dynamo.disable()(self.calc_kv_scales)(query, key, value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not going to work, vLLM requires torch.compile with fullgraph=True, torch._dynamo.disable induces a graph break

Copy link
Collaborator

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this works, please see my comment in #24290 (comment)

@github-project-automation github-project-automation bot moved this from Needs Reproduction to Done in torch.compile integration Sep 22, 2025
@ProExpertProg
Copy link
Collaborator

See #21640 for info and to stay updated on the fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants