[Bugfix] Support compile for Transformers multimodal #23095

zucchini-nlp · 2025-08-18T09:45:35Z

As per title, ensures that compile is supported for all models. The only issue we had previously was with Qwen-VL model family and now it is fixed if we mark position ids' last dim as dynamic

Apparently in main branch compile is enabled already because the base model has the decorator, so it is simply failing to run Qwen-VL if we don't enforce eager.

Benchmark results on Vision-Arena for Qwen with/without compilation is below. The vLLM is still much faster for this model which ig is due to vision attention. For other models like LlaVa there is no difference in throughput

Model: 'Qwen/Qwen2-VL-2B-Instruct'; Total num prompt tokens:  770572; Total num output tokens:  128000

Transformers backend:
Compiled - Throughput: 10.71 requests/s, 9625.89 total tokens/s, 1371.19 output tokens/s
Eager - Throughput: 9.76 requests/s, 8771.64 total tokens/s, 1249.51 output tokens/s

vLLM backend:
Compiled- Throughput: 12.95 requests/s, 11634.86 total tokens/s, 1657.37 output tokens/s
Eager - Throughput: 12.87 requests/s, 11561.67 total tokens/s, 1646.94 output tokens/s


Another model to compare throughput - "llava-hf/llava-v1.6-mistral-7b-hf"
vLLM compiled: Throughput: 3.17 requests/s, 7664.14 total tokens/s, 405.24 output tokens/s
Transformers compiled: Throughput: 3.07 requests/s, 7444.31 total tokens/s, 393.45 output tokens/s

cc @hmellor @Isotr0py

Signed-off-by: raushan <raushan@huggingface.co>

gemini-code-assist

Code Review

This pull request enables torch.compile for multimodal transformer models, which specifically addresses a compilation failure with the Qwen-VL model family. The fix involves specifying dynamic dimensions for arguments in the model's forward pass. While the change is small and targeted, I have a concern that the implementation might be too broad and could potentially impact other multimodal models. My review includes one high-severity comment detailing this concern.

gemini-code-assist · 2025-08-18T09:46:45Z

vllm/model_executor/models/transformers.py

+@support_torch_compile(
+    dynamic_arg_dims={
+        "input_ids": 0,
+        "positions": -1,
+        "intermediate_tensors": 0,
+        "inputs_embeds": 0,
+    })  # set `positions` to last dim to support Qwen-mrope


This change hardcodes the dynamic dimension for the positions argument to -1. While the comment indicates this is to support Qwen-mrope, applying this configuration to the generic TransformersForMultimodalLM class may cause issues for other multimodal models.

For many models, the positions tensor has a shape of (num_tokens, 2), where the number of tokens is the dynamic dimension (dim 0). Marking dimension -1 as dynamic would incorrectly target the feature dimension of size 2, potentially causing compilation or runtime errors for non-Qwen models that use this class.

To ensure this fix doesn't introduce regressions, consider making this configuration specific to Qwen-VL models. One approach could be to introduce a separate class for Qwen-VL that encapsulates this compilation behavior, and update the model loading logic to use it for the relevant models.

github-actions · 2025-08-18T09:53:12Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Isotr0py

Thanks, LGTM!

) Signed-off-by: raushan <raushan@huggingface.co>

) Signed-off-by: raushan <raushan@huggingface.co> Signed-off-by: Duncan Moss <djm.moss@gmail.com>

) Signed-off-by: raushan <raushan@huggingface.co>

) Signed-off-by: raushan <raushan@huggingface.co> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

) Signed-off-by: raushan <raushan@huggingface.co>

supports compile for multimodal transformers

2d26b34

Signed-off-by: raushan <raushan@huggingface.co>

gemini-code-assist bot reviewed Aug 18, 2025

View reviewed changes

Isotr0py approved these changes Aug 18, 2025

View reviewed changes

Isotr0py enabled auto-merge (squash) August 18, 2025 09:54

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 18, 2025

Merge branch 'main' into vlm-transformers

5bf1cd2

Isotr0py merged commit 0e3bb54 into vllm-project:main Aug 18, 2025
41 of 42 checks passed

princepride pushed a commit to princepride/vllm that referenced this pull request Aug 20, 2025

[Bugfix] Support compile for Transformers multimodal (vllm-project#23095

e1db853

) Signed-off-by: raushan <raushan@huggingface.co>

divakar-amd pushed a commit to divakar-amd/vllm_upstream that referenced this pull request Aug 20, 2025

[Bugfix] Support compile for Transformers multimodal (vllm-project#23095

c2eef8f

) Signed-off-by: raushan <raushan@huggingface.co>

cyang49 pushed a commit to cyang49/vllm that referenced this pull request Aug 20, 2025

[Bugfix] Support compile for Transformers multimodal (vllm-project#23095

90e7a26

) Signed-off-by: raushan <raushan@huggingface.co>

djmmoss pushed a commit to djmmoss/vllm that referenced this pull request Aug 21, 2025

[Bugfix] Support compile for Transformers multimodal (vllm-project#23095

a08fb18

) Signed-off-by: raushan <raushan@huggingface.co> Signed-off-by: Duncan Moss <djm.moss@gmail.com>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

[Bugfix] Support compile for Transformers multimodal (vllm-project#23095

5e1e625

) Signed-off-by: raushan <raushan@huggingface.co>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

[Bugfix] Support compile for Transformers multimodal (vllm-project#23095

d9d4baa

) Signed-off-by: raushan <raushan@huggingface.co> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

[Bugfix] Support compile for Transformers multimodal (vllm-project#23095

db48c32

) Signed-off-by: raushan <raushan@huggingface.co>

mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025

[Bugfix] Support compile for Transformers multimodal (vllm-project#23095

9f1689d

) Signed-off-by: raushan <raushan@huggingface.co>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025

[Bugfix] Support compile for Transformers multimodal (vllm-project#23095

3bb2ed0

) Signed-off-by: raushan <raushan@huggingface.co>

hmellor added this to Transformers backend Oct 7, 2025

hmellor moved this to Done in Transformers backend Oct 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Support compile for Transformers multimodal #23095

[Bugfix] Support compile for Transformers multimodal #23095

Uh oh!

zucchini-nlp commented Aug 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 18, 2025

Uh oh!

github-actions bot commented Aug 18, 2025

Uh oh!

Isotr0py left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Bugfix] Support compile for Transformers multimodal #23095

[Bugfix] Support compile for Transformers multimodal #23095

Uh oh!

Conversation

zucchini-nlp commented Aug 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 18, 2025

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zucchini-nlp commented Aug 18, 2025 •

edited by github-actions bot

Loading